Methods and algorithms for aiding in the detection of cancer

ABSTRACT

A method of data interpretation from a multiplex cancer assay is described. The aggregate normalized score from the assay is transformed to a quantitative risk score quantifying a human subject&#39;s increased risk for the presence of cancer as compared to the known prevalence of the cancer in the population before testing the subject.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation Application of U.S. Ser. No.15/483,218, filed 10 Apr. 2017, which is a Divisional Application ofU.S. Ser. No. 13/718,457, filed 18 Dec. 2012, now U.S. Pat. No.9,753,043, which claims the benefit under 35 U.S.C. 1.19(e) of U.S.Provisional Patent Application No. 61/577,083, filed 18 Dec. 2011, theentire contents of which are each incorporated herein by reference.

FIELD OF THE INVENTION

The disclosure relates to methods and algorithms for quantifying anincreased risk for the presence of cancer in an asymptomatic humansubject.

BACKGROUND OF THE INVENTION

Early Detection of Cancer

It is well established that for most cancers patient outcomes improvesignificantly if surgery and other therapeutic interventions commencebefore the tumor has metastasized. Accordingly many different techniquesand technologies have been introduced into medical practice in anattempt to help physicians detect cancer early. These include variousimaging modalities such as mammography as well as tests to identifycancer specific “biomarkers” in the blood and other bodily fluids suchas the prostate specific antigen (PSA) test. The utility and value ofmany of these tests is often questioned particularly with regard towhether the costs and risks associated with false positives, falsenegatives, etc. outweighs the potential benefits in terms of actuallives saved.

Cancer detection poses significant technical challenges as compared todetecting infections since cancer cells, unlike viruses and bacteria,are biologically similar to and hard to distinguish from normal, healthycells. For this reason tests used for the early detection of canceroften suffer from higher numbers of false positives and false negativesthan comparable tests for viral or bacterial infections or for teststhat measure genetic, enzymatic or hormonal abnormalities. This oftencauses confusion among healthcare practitioners and their patientsleading in some cases to unnecessary, expensive, and invasive follow-ontesting while in other cases a complete disregard for follow-up testingresulting in cancers detected too late for useful intervention. To besure, physicians and patients welcome tests that yield a binary decisionor result, either the patient is positive or negative for a condition,such as observed in the over the counter pregnancy test kits whichpresent, for example, an immunoassay result in the shape of a plus signor a negative sign as indication of pregnancy or not. However, unlessthe sensitivity and specificity of diagnosis approaches 99%, a level notobtainable for most cancer tests, such binary outputs can be highlymisleading.

A need therefore exists for an approach or method that communicates topatients their relative risk of their having cancer that is clear andquantitative but avoids reporting results in “black or white” terms thatcan lead either to excessive worry or undue complacency. In this way,the risk of having a particular cancer can be defined in a way thatallows a physician the ability to prioritize and target those higherrisk patients in need of follow-up testing from those at lower risk.Such an approach would not only save lives and costs, but allows for amore personalized approach to screening and identifies those patientsmost likely to benefit from expensive and invasive follow-on testing.Primary care providers in particular typically see a high volume ofpatients per day and the demands of healthcare cost containment hasdramatically shortened the amount of time they can spend with eachpatient. Accordingly they often lack sufficient time to take in depthfamily and lifestyle histories, to counsel patients on healthylifestyles, or to follow-up with patients who have been recommendedtesting beyond that which is provided in their office practice.

It would, furthermore, be desirable that the aforementioned approach ormethod be more precise and accurate than mere epidemiological orlifestyle considerations. It is well known that factors such as age,family history, tobacco and alcohol use, diet and obesity impact thelikelihood of having cancer in particular individuals. However, thesefactors alone provide, at best, a crude and subjective way forphysicians to stratify the cancer risks among their patient population.

Others have provided algorithms wherein an individual can attempt topersonalize their risk, without any testing, simply by providingrelevant personal history such as age and their current smoking status.However, while these algorithms may be more accurate than relying on thereported rate of cancer in a particular group they do not take intoaccount an individual's actual biological factors.

Thus, it would be desirable to provide a technique and method thatovercomes the aforementioned limitations that quantifies an individual'srisk as compared to their risk before testing.

Lung cancer and Early Detection

Lung cancer is by far the leading cause of cancer deaths in NorthAmerica and most of the world killing more people than the next threemost lethal cancers combined, namely breast, prostate, and colorectalcancer. Lung cancer results in over 156,000 deaths per year in theUnited States alone (American Cancer Society. Cancer Facts & Figures2011. Atlanta: American Cancer Society; 2011). Tobacco use has beenidentified as a primary causal factor for lung cancer and is thought toaccount for some 90% of cases. Thus, individuals over 50 years of agewith a smoking history of greater than 20 pack-years have a 1 in 7lifetime risk of developing the disease. Lung cancer is a relativelysilent disease displaying few if any specific symptoms until it reachesthe later more advanced stages. Therefore most patients are notdiagnosed until their cancer has metastasized beyond the lung and theyare no longer treatable by surgery alone. Thus, while the best way toprevent lung cancer is likely tobacco avoidance or cessation, for manycurrent and former smokers, the transforming, cancer-causing event hasalready occurred and even though the cancer is not yet manifest, thedamage is already done. Thus, perhaps the most effective means ofreducing lung cancer mortality today is early stage detection when thetumor is still localized and amenable to surgery with intent to cure.

The importance of early detection was recently demonstrated in a large7-year clinical study, the National Lung Cancer Screening Trial (NLST),which compared chest x-ray and chest CT scanning as potential modalitiesfor the early detection of lung cancer (National Lung Screening TrialResearch Team, Aberle D R, Adams A M, Berg C D, Black W C, Clapp J D,Fagerstrom R M, Gareen I F, Gatsonis C, Marcus P M, Sicks J D. Reducedlung-cancer mortality with low-dose computed tomographic screening. NEngl J Med. 2011 Aug. 4; 365(5):395-409). The trial concluded that theuse of chest CT scans to screen the at-risk population identifiedsignificantly more early stage lung cancers than chest x-ray andresulted in a 20% overall reduction in disease mortality. This study hasclearly indicated that identifying lung cancer early can save lives.Unfortunately, the broad application of CT scanning as a screeningmethod for lung cancer is not without problems. The NLST design utilizeda serial CT screening paradigm in which patients received a CT scanannually for only three years. Nearly 40% of the participants receivingthe annual CT scan over 3 years had at least one positive screeningresult and 96.4% of these positive screening results were falsepositives. This very high rate of false positives can cause patientanxiety and a burden on the healthcare system, as the work-up followinga positive finding on low-dose CT scans often includes advanced imagingand biopsies. Although CT scanning is an important tool for the earlydetection of lung cancer, more than two years after the NLST resultswere announced, very few patients at high risk for lung cancer due tosmoking history have initiated a program of annual CT scans. Thisreluctance to undergo yearly CT scans is likely due to a number offactors including costs, perceived risks of radiation exposureespecially by serial CT scans, the inconvenience or burden toasymptomatic patients of scheduling a separate diagnostics procedure ata radiology center, as well as concerns by physicians that the very highfalse positive rates of CT scanning as a standalone test will result ina significant number of unnecessary follow up diagnostic tests andinvasive procedures.

While the overall lifetime risk for lung cancer amongst smokers is high,the chance that any individual smoker has cancer at a specific point intime is only on the order of 1.5-2.7% [Bach, P. B., et al., Screeningfor Lung Cancer*ACCP Evidence-Based Clinical Practice Guidelines (2ndEdition). CHEST Journal, 2007. 132(3_suppl): p. 69S-77S.]. Due to thislow disease prevalence, a simple method to better identify whichpatients are at highest risk is necessary. The ideal method would benon-invasive, highly accurate and easily performed in the context of thestandard work-up of the patient at a yearly physician visit with thestandard blood work-up. Such a test needs to have at least a moderatelevel of sensitivity and be amenable to serial testing with a high levelof patient compliance. The best format for such a test that meets all ofthese requirements is a simple blood test.

It would be desirable to have such a blood test for lung cancer in,asymptomatic, at risk patients (including smokers and former smokers)wherein their risk for the presence of cancer is quantified in terms ofan increase over others in the same risk population. Such a test wouldideally help healthcare practitioners communicate to patients theirrelative risk of having cancer that is clear and quantitative but avoidsabsolute “yes or no” results associated with false positives ornegatives which discourage patients from being tested on a routinebasis.

It would also be desirable to have such a test that gives physicians theability to prioritize and target those patients at the highest risk forlung cancer for advanced testing such as CT scans.

These and other advantages of the present invention may be betterunderstood by referring to the following description, accompanyingdrawings and claims. This description of an embodiment, set out below toenable one to practice an implementation of the invention, is notintended to limit the preferred embodiment, but to serve as a particularexample thereof. Those skilled in the art should appreciate that theymay readily use the conception and specific embodiments disclosed as abasis for modifying or designing other methods and systems for carryingout the same purposes of the present invention. Those skilled in the artshould also realize that such equivalent assemblies do not depart fromthe spirit and scope of the invention in its broadest form.

SUMMARY OF THE INVENTION

The present invention relates generally to non-invasive methods andtests to help assess the likelihood that a patient has cancer relativeto a wider patient population as a first step to determine whether thatpatient should be followed up with additional, more invasive cancertesting. It has now been discovered that by use of retrospect clinicalsamples (cancer and control) and a panel of biomarkers for cancer,asymptomatic patients can now have their risk for the presence of cancerquantified in terms of an increase over the population. It is nowpossible to produce meaningful information for physicians in at-risk,but asymptomatic, patient population groups that can be used to informfurther screening procedures.

More specifically, the invention includes, for example, a blood test forassessing the likelihood that a patient has lung cancer relative to apopulation of individuals of a similar age range and smoking history. Inthis example, several biomarkers are analyzed from the patient's fluidsample, e.g., a blood sample, which leads to a composite score that isthen compared to a database of composite scores from a wider populationof patients known to have lung cancer as well as non-cancer controls.This permits the patients risk of having lung cancer to be categorizedas low, intermediate, high, very high, etc. Armed with this information,physicians and other healthcare practitioners, their patients, andhealth insurance companies, can better determine which patients are mostlikely to benefit from follow-on testing including CT screening. Such amethod reduces the costs, anxiety, and radiation exposure associatedwith having lower risk patients undergo CT scans while helping to ensurethat patients at higher risks of having lung cancer undergo CT scanningin hopes of detecting their tumor at an early stage when they can stillbenefit from curative surgery.

BRIEF DESCRIPTION OF THE FIGURES

The numerous advantages of the present invention may be betterunderstood by those skilled in the art by reference to the accompanyingfigures in which:

FIG. 1 shows an example of a Risk Categorization Table for lung cancer.In this risk categorization table, the inflection point between having arisk greater than the observed risk of smokers of 2% occurs with anaggregate MoM score of 9. With an aggregate score of 9 or less, thatpatient has a risk of lung cancer no greater than does any other heavysmoker not yet diagnosed. A MoM score greater than 9 indicates a greaterrisk of cancer or a higher likelihood of cancer as compared to thesmoking population.

FIG. 2 shows a table of the distribution of patient samples analyzed,including patients with all stages of cancer, at risk populations andvarious other control groups including those with non-cancerous lungdisorders and other cancers.

FIG. 3 shows a receiver operator characteristic (ROC) curve analysis ofall lung cancer vs. all non-cancer samples yielded an area under thecurve (AUC) of 0.76.

FIG. 4 shows, in table form, the statistical validation using a cohortof 322 samples obtained with the specific intent of early detection inthe high risk population.

FIG. 5 shows the ROC curve analysis for the cohort of 322 samples withan AUC of 0.73.

FIG. 6 shows the linearity of one of the tumor markers in a spike andrecovery assay.

FIG. 7 shows the biomarker precision and repeatability from a clinicalbridging study in table form.

FIG. 8 shows results from a blinded retrospective study using the sixlung cancer biomarker panel.

FIG. 9 shows, in table form, results from the lung cancer assay forat-risk subjects re-categorizing the patients risk for the presence oflung cancer.

FIG. 10 shows results from the lung cancer assay for at-risk subjectsre-categorizing the patient's risk, based on a range of compositescores, for the presence of lung cancer.

DETAILED DESCRIPTION OF THE INVENTION

A) Introduction

The present invention provides a risk categorization of a populationused to determine a quantified risk level for the presence of a cancerin an asymptomatic human subject. The method is preferably used as partof a blood test that measures multiple biomarkers in the blood. Incertain embodiments, the risk categorization is herein referred to arisk categorization table. As used herein, the term “table” is used inits broadest sense to refer to a grouping of data into a formatproviding for ease of interpretation or presentation, this includes, butis not limited to a computer program, software application, table,sliding template (e.g., pinwheel), spreadsheet, etc. Thus, in oneembodiment the risk categorization table is a grouping of stratifiedhuman subject populations. This stratification of human subjects isbased on analysis of retrospective clinical samples from subjects havinga cancer wherein the actual incidence of cancer, herein referred to asthe “positive predictive score” is determined for each stratifiedgrouping. As used herein, the analysis of retrospective clinical samplesrefers to measurement of markers in those samples, includingnormalization of values and summing those values to generate a riskscore for each sample. The positive predictive score is then convertedto a multiplier indicating increased likelihood of having the cancer bydividing the positive predictive score by the reported incidence ofcancer in the cohort of the population subject to stratification, (e.g.human subjects 50 years or older). Each grouping is given a riskcategorization indicator, including, but not limited to, low risk,intermediate-low risk, intermediate risk, intermediate-high risk andhighest risk. Thus, in one embodiment, each category of the riskcategorization table comprises 1) a multiplier indicating increasedlikelihood of having the cancer, 2) a risk identifier and 3) a range ofcomposite scores.

It is understood that the basis for the stratification of a cohort of apopulation of human subjects is based on 1) an identification of acertain cancer and 2) biomarkers that are associated with the cancer. Inother words, a cohort shares the same cancer risk factors. Validation ofthe biomarkers to be used in the present methods is provided byanalyzing retrospective cancer samples along with age matched normal(non-cancer) samples.

The generation of a risk categorization table, including methods fornormalizing biomarker data, is provided in more detail below along witha specific example for lung cancer.

The present invention further provides an algorithm for analyzing apanel of biomarkers for a cancer and quantifying a human subject'sincreased risk (or in certain circumstances decreased risk) for thepresence of the cancer in an asymptomatic human subject relative to apopulation. As used herein, the term “increased risk” refers to anincrease for the presence of the cancer as compared to the knownprevalence of that particular cancer across the population cohort. Thepresent methods are based on the generation of a risk categorizationtable for a certain cancer; wherein there is no intended limitation onwhen this table is generated just that when utilized the quantified riskis at the time of testing. Thus, the present method and riskcategorization table is based, at least in part, on 1) theidentification and clustering of a set of proteins and/or resultingautoantibodies to those proteins that can serve as markers for thepresence of a cancer, 2) normalization and summing of the markersmeasured to generate a composite score; and, 3) determination ofthreshold values used to divide patients into groups with varyingdegrees of risk for the presence of cancer in which the likelihood of anasymptomatic human subject having a quantified increased risk for thepresence of the cancer is determined. The algorithm yields a numericalrisk score for each patient tested, which can be used by physicians tomake treatment decisions concerning the therapy of cancer patients or,importantly, to further inform screening procedures to better predictedand diagnose early stage cancer in asymptomatic patients.

B) Definitions

As used herein, the terms “a” or “an” are used, as is common in patentdocuments, to include one or more than one, independent of any otherinstances or usages of “at least one” or “one or more.”

As used herein, the term “or” is used to refer to a nonexclusive or,such that “A or B” includes “A but not B,” “B but not A,” and “A and B,”unless otherwise indicated.

As used herein, the term “about” is used to refer to an amount that isapproximately, nearly, almost, or in the vicinity of being equal to oris equal to a stated amount, e.g., the state amount plus/minus about 5%,about 4%, about 3%, about 2% or about 1%.

As used herein, the term “asymptomatic” refers to a patient or humansubject that has not previously been diagnosed with the same cancer thattheir risk of having is now being quantified and categorized. Forexample, human subjects may shows signs such as coughing, fatigue, pain,etc., but had not been previously diagnosed with lung cancer but are nowundergoing screening to categorize their increased risk for the presenceof cancer and for the present methods are still considered“asymptomatic”.

As used herein, the term “AUC” refers to the Area Under the Curve, forexample, of a ROC Curve. That value can assess the merit of a test on agiven sample population with a value of 1 representing a good testranging down to 0.5 which means the test is providing a random responsein classifying test subjects. Since the range of the AUC is only 0.5 to1.0, a small change in AUC has greater significance than a similarchange in a metric that ranges for 0 to 1 or 0 to 100%. When the %change in the AUC is given, it will be calculated based on the fact thatthe full range of the metric is 0.5 to 1.0. A variety of statisticspackages can calculate AUC for an ROC curve, such as, JMP™ orAnalyse-It™. AUC can be used to compare the accuracy of theclassification algorithm across the complete data range. Classificationalgorithms with greater AUC have, by definition, a greater capacity toclassify unknowns correctly between the two groups of interest (diseaseand no disease). The classification algorithm maybe as simple as themeasure of a single molecule or as complex as the measure andintegration of multiple molecules.

As used herein, the terms “biological sample” and “test sample” refer toall biological fluids and excretions isolated from any given subject. Inthe context of the present invention such samples include, but are notlimited to, blood, blood serum, blood plasma, urine, tears, saliva,sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, bronchial andother lavage samples, or tissue extract samples. In certain embodiments,blood, serum, plasma and bronchial lavage or other liquid samples areconvenient test samples for use in the context of the present methods.

As used herein, the terms “cancer” and “cancerous” refer to or describethe physiological condition in mammals that is typically characterizedby unregulated cell growth. Examples of cancer include but are notlimited to, lung cancer, breast cancer, colon cancer, prostate cancer,hepatocellular cancer, gastric cancer, pancreatic cancer, cervicalcancer, ovarian cancer, liver cancer, bladder cancer, cancer of theurinary tract, thyroid cancer, renal cancer, carcinoma, melanoma, andbrain cancer.

As used herein, the term “cancer risk factors” refers to biological orenvironmental influences that are known risks associated with aparticular cancer. These cancer risk factors include, but are notlimited to, a family history of cancer (e.g. breast cancer), age,weight, sex, history of smoking tobacco, exposure to asbestos, exposureto radiation, etc. It is understood that these cancer risk factors,either individually or a combination thereof, contribute to selecting acohort of the population used to develop a Risk Categorization Table andthat this same cohort is then tested using the present methods andalgorithm to determine their increased risk for the presence of canceras compared to the known prevalence of cancer across the cohort. Incertain embodiments, cancer risk factors for lung cancer are a humansubject aged 50 years or older with a history of smoking tobacco.

As used herein, the term “cohort” refers to a group or segment of humansubjects with shared factors or influences, such as age, family history,cancer risk factors, environmental influences, etc. In one instance, asused herein, a “cohort” refers to a group of human subjects with sharedcancer risk factors; this is also referred to herein as a “diseasecohort”. In another instance, as used herein, a “cohort” refers to anormal population group matched, for example by age, to the cancer riskcohort; also referred to herein as a “normal cohort”.

As used herein, the term “composite score” refers to a summation of thenormalized values for the predetermined markers measured in the samplefrom the human subject. In one embodiment, the normalized values arereported as a multiple of median (MoM) values and those MoM values arethen summed to provide a composite score for each human subjectedtested. When used in the context of the risk categorization table andcorrelated to a stratified grouping based on a range of composite scoresin the Risk Categorization Table, the “composite score” is used todetermine the “risk score” for each human subject tested wherein themultiplier indicating increased likelihood of having the cancer for thestratified grouping becomes the “risk score”. See, FIG. 1 .

In certain aspects the “cohort score” is also referred to herein as the“test score”.

As used herein, the term “decision tree” refers to a classifier with aflow-chart-like tree structure employed for classification. Decisiontrees consist of repeated splits of a data set into subsets. Each splitconsists of a simple rule applied to one variable, e.g., “if value of‘variable 1’ larger than ‘threshold 1’; then go left, else go right”.Accordingly, the given feature space is partitioned into a set ofrectangles with each rectangle assigned to one class.

As used herein, the terms “differentially expressed gene,” “differentialgene expression” and their synonyms, which are used interchangeably, areused in the broadest sense and refers to a gene and/or resulting proteinwhose expression is activated to a higher or lower level in a subjectsuffering from a disease, specifically cancer, such as lung cancer,relative to its expression in a normal or control subject. The termsalso include genes whose expression is activated to a higher or lowerlevel at different stages of the same disease. It is also understoodthat a differentially expressed gene may be either activated orinhibited at the nucleic acid level or protein level, or may be subjectto alternative splicing to result in a different polypeptide product.Such differences may be evidenced by a change in mRNA levels, surfaceexpression, secretion or other partitioning of a polypeptide, forexample. Differential gene expression may include a comparison ofexpression between two or more genes or their gene products (e,g,proteins), or a comparison of the ratios of the expression between twoor more genes or their gene products, or even a comparison of twodifferently processed products of the same gene, which differ betweennormal subjects and subjects suffering from a disease, specificallycancer, or between various stages of the same disease. Differentialexpression includes both quantitative, as well as qualitative,differences in the temporal or cellular expression pattern in a gene orits expression products among, for example, normal and diseased cells,or among cells which have undergone different disease events or diseasestages.

As used herein, the term “gene expression profiling” is used in thebroadest sense, and includes methods of quantification of mRNA and/orprotein levels in a biological sample.

As used herein, the term “increased risk” refers to an increase in therisk level, for a human subject after testing, for the presence of acancer relative to a population's known prevalence of a particularcancer before testing. In other words, a human subject's risk for cancerbefore testing may be 2% (based on the understood prevalence of cancerin the population), but after testing (based on the measure ofbiomarkers) their risk for the presence of cancer may be 30% oralternatively reported as an increase of 15 times compared to thecohort. The algorithm for calculating the 30% risk of having the cancerand the increased risk of 15 times the cohort population is provided inmore detail below. It is also contemplated, as will be apparent from thepresent Risk Categorization Table and accompanying algorithm, that it ispossible that the re-categorization of a patients risk for the presenceof a cancer results in a risk that is less than the known prevalence ofa particular cancer across a population cohort. For example, a humansubjects risk for cancer before testing may be 2% (based on theunderstood prevalence of cancer in the population), but after testing(based on the measure of biomarkers) their risk for the presence ofcancer may be 1% or alternatively reported as an increase of 0.5 timescompared to the cohort. In this instance, “increased risk” refers to achange in risk level relative to a population before testing.

As used herein, the term “decreased risk” refers to a decrease in therisk level, for a human subject after testing, for the presence of acancer relative to a population's known prevalence of a particularcancer before testing. In this instance, “decreased risk” refers to achange in risk level relative to a population before testing.

As used herein, the term “lung cancer” refers to a cancer stateassociated with the pulmonary system of any given subject. In thecontext of the present invention, lung cancers include, but are notlimited to, adenocarcinoma, epidermoid carcinoma, squamous cellcarcinoma, large cell carcinoma, small cell carcinoma, non-small cellcarcinoma, and bronchoalveolar carcinoma. Within the context of thepresent invention, lung cancers may be at different stages, as well asvarying degrees of grading. Methods for determining the stage of a lungcancer or its degree of grading are well known to those skilled in theart.

As used herein, the terms “marker”, “biomarker” (or fragment thereof)and their synonyms, which are used interchangeably, refer to moleculesthat can be evaluated in a sample and are associated with a physicalcondition. For example, a markers include expressed genes or theirproducts (e.g. proteins) or autoantibodies to those proteins that can bedetected from a human samples, such as blood, serum, solid tissue, andthe like, that, that is associated with a physical or disease condition.Such biomarkers include, but are not limited to, biomolecules comprisingnucleotides, amino acids, sugars, fatty acids, steroids, metabolites,polypeptides, proteins (such as, but not limited to, antigens andantibodies), carbohydrates, lipids, hormones, antibodies, regions ofinterest which serve as surrogates for biological molecules,combinations thereof (e.g., glycoproteins, ribonucleoproteins,lipoproteins) and any complexes involving any such biomolecules, suchas, but not limited to, a complex formed between an antigen and anautoantibody that binds to an available epitope on said antigen. Theterm “biomarker” can also refer to a portion of a polypeptide (parent)sequence that comprises at least 5 consecutive amino acid residues,preferably at least 10 consecutive amino acid residues, more preferablyat least 15 consecutive amino acid residues, and retains a biologicalactivity and/or some functional characteristics of the parentpolypeptide, e.g. antigenicity or structural domain characteristics. Thepresent markers refer to both tumor antigens present on or in cancerouscells or those that have been shed from the cancerous cells into bodilyfluids such as blood or serum. The present markers, as used herein, alsorefer to autoantibodies produced by the body to those tumor antigens. Inone aspect, a “marker” as used herein refers to both tumor antigens andautoantibodies that are capable of being detected in serum of a humansubject. It is also understood in the present methods that use of themarkers in a panel may each contribute equally to the composite score orcertain biomarkers may be weighted wherein the markers in a panelcontribute a different weight or amount to the final composite score.

As used herein, the term “multiplier indicating increased likelihood ofhaving the cancer” refers to a numerical value of the riskcategorization table and assigned to a patient sample after testingquantifying that patients increased risk, above the cohort population,for the presence of a cancer. When used in the context of the riskcategorization table when testing a human subject and correlated to arange of composite scores, the “multiplier indicating increasedlikelihood of having the cancer” becomes the “risk score” for each humansubject tested. See, FIG. 1 .

As used herein, the terms “multiple of median” or “MoM” refers to ameasure of how far an individual test result deviates from the median.In the present method a predetermined marker is measured in a samplefrom an asymptomatic subject and the value is normalized as a multipleof median score.

As used herein, the term “normalization” and its derivatives, when usedin conjunction with measurement of biomarkers across samples and time,refer to mathematical methods where the intention is that thesenormalized values allow the comparison of corresponding normalizedvalues from different datasets in a way that eliminates or minimizesdifferences and gross influences. In one embodiment, multiple of medianis used as the normalization methodology for the present methods.

As used herein, the terms “panel of markers”, “panel of biomarkers” andtheir synonyms, which are used interchangeably, refer to more than onemarker that can be detected from a human sample that together, areassociated with the presence of a particular cancer. In an embodiment ofthe present application, the presence of the biomarkers are notindividually quantified as an absolute value to indicate the presence ofa cancer, but the measured values are normalized and the normalizedvalue is summed to provide a composite score. As disclosed above, eachmarker in the panel may be given the weight of 1, or some other valuethat is either a fraction of 1 or a multiple of 1, depending on thecontribution of the marker to the cancer being screened and the overallcomposition of the panel.

As used herein, the term “pathology” of (tumor) cancer includes allphenomena that compromise the well-being of the patient. This includes,without limitation, abnormal or uncontrollable cell growth, metastasis,interference with the normal functioning of neighboring cells, releaseof cytokines or other secretory products at abnormal levels, suppressionor aggravation of inflammatory or immunological response, neoplasia,premalignancy, malignancy, invasion of surrounding or distant tissues ororgans, such as lymph nodes, etc.

As used herein, the term “known prevalence of cancer” refers to aprevalence of a cancer in a population before the human subject istested using the present methods. This known prevalence of cancer, canbe a prevalence reported in the literature based on retrospective dataor an algorithm applied to that prevalence where in the algorithm takesinto account factors such as age and more immediate and relevanthistory. In this instance, a known prevalence of cancer in a cohortrefers to a risk of having cancer prior to being tested by the presentmethods.

As used herein, the term “a positive predictive score,” “a positivepredictive value,” or “PPV” refers to the likelihood that a score withina certain range on a biomarker test is a true positive result. It isdefined as the number of true positive results divided by the number oftotal positive results. True positive results can be calculated bymultiplying the test Sensitivity times the Prevalence of disease in thetest population. False positives can be calculated by multiplying (1minus the Specificity) times (1− the prevalence of disease in the testpopulation). Total positive results equal True Positives plus FalsePositivies.

As used herein, the term “risk score” refers to a single numerical valuethat indicates an asymptomatic human subject's increased risk for thepresence of a cancer as compared to the known prevalence of cancer inthe disease cohort. In certain embodiments of the present methods, thecomposite score as calculated for a human subject and correlated to amultiplier indicating increased likelihood of having the cancer, whereinthe composite score is correlated based on the range of composite scoresfor each stratified grouping in the risk categorization table. In thisway the composite score is converted to a risk score based on themultiplier indicating increased likelihood of having the cancer for thegrouping that is the best match for the composite score. See, FIG. 1 .

As used herein the term, “Receiver Operating Characteristic Curve,” or,“ROC curve,” is a plot of the performance of a particular feature fordistinguishing two populations, patients with lung cancer, and controls,e.g., those without lung cancer. Data across the entire population(namely, the patients and controls) are sorted in ascending order basedon the value of a single feature. Then, for each value for that feature,the true positive and false positive rates for the data are determined.The true positive rate is determined by counting the number of casesabove the value for that feature under consideration and then dividingby the total number of patients. The false positive rate is determinedby counting the number of controls above the value for that featureunder consideration and then dividing by the total number of controls.

ROC curves can be generated for a single feature as well as for othersingle outputs, for example, a combination of two or more features thatare combined (such as, added, subtracted, multiplied etc.) to provide asingle combined value which can be plotted in a ROC curve.

The ROC curve is a plot of the true positive rate (sensitivity) of atest against the false positive rate (1-specificity) of the test. ROCcurves provide another means to quickly screen a data set.

As used herein, the term “screening” refers to a strategy used in apopulation to identify an unrecognized cancer in asymptomatic subjects,for example those without signs or symptoms of the cancer. As usedherein, a cohort of the population (e.g. smokers aged 50 or older) arescreened for a particular cancer (e.g. lung cancer) wherein the presentalgorithm is applied to determine the quantified increased risk to thoseasymptomatic subjects for the presence of the cancer.

As used herein, the term “subject” refers to an animal, preferably amammal, including a human or non-human. The terms “patient” and “humansubject” may be used interchangeably herein.

As used herein, the term “tumor,” refers to all neoplastic cell growthand proliferation, whether malignant or benign, and all pre-cancerousand cancerous cells and tissues.

As used herein, the phrase “Weighted Scoring Method” refers to a methodthat involves converting the measurement of one biomarker that isidentified and quantified in a test sample into one of many potentialscores. A ROC curve can be used to standardize the scoring betweendifferent markers by enabling the use of a weighted score based on theinverse of the false positive % defined from the ROC curve. The weightedscore can be calculated by multiplying the AUC by a factor for a markerand then dividing by the false positive % based on a ROC curve. Theweighted score can be calculated using the formula:

Weighted Score=(AUC_(x)×factor)/(1-% specificityx)

wherein x is the marker; the, “factor,” is a real number (such as 0, 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25 and so on) throughout the panel; and the, “specificity,”is a chosen value that does not exceed 95%. Multiplication of a factorfor the panel allows the user to scale the weighted score. Hence, themeasurement of one marker can be converted into as many or as few scoresas desired.

The weighting provides higher scores for biomarkers with a low falsepositive rate (thereby having higher specificity) for the population ofinterest. The weighting paradigm can comprise electing levels of falsepositivity (1-specificity) below which the test will result in anincreased score. Thus, markers with high specificity can be given agreater score or a greater range of scores than markers that are lessspecific.

Foundation for assessing the parameters for weighing can be obtained bydetermining presence of a marker in a population of patients with lungcancer and in normal individuals. The information (data) obtained fromall the samples are used to generate a ROC curve and to create an AUCfor each biomarker. A number of predetermined cutoffs and a weightedscore are assigned to each biomarker based on the % specificity. Thatcalculus provides a stratification of aggregate scores, and those scorescan be used to define ranges that correlate to arbitrary risk categoriesof whether one has a higher or lower risk of having lung cancer. Thenumber of categories can be a design choice or may be driven by thedata.

C) Methods for Determining a Quantified Risk Level for the Presence of aCancer in an Asymptomatic Human

In certain embodiments, provided herein are methods for quantifying therisk level of an asymptomatic patient relative to a population. In oneaspect, the risk level is increased as compared to the population. Inanother aspect, the risk level is decreased as compared to thepopulation. The asymptomatic patients that, after testing, have aquantified increased risk for the presence of cancer relative to thepopulation are those that a physician may select for follow-on testingand it is important to not only know that their risk is increased,relative to the population, but that their risk is quantified.

Therefore, in certain embodiments, the method of determining aquantified increased risk for the presence of a cancer in anasymptomatic human subject, comprises 1) measuring a panel of markers ina sample from the human subject; 2) determining a normalized value ofeach marker in a sample from a human subject; 3) summing the normalizedvalue to obtain a composite score for the human subject; 4) quantifyingthe increased risk for the presence of the cancer for the human subjectas a risk score, wherein the composite score is matched to a riskcategory of a grouping of stratified human subject populations whereineach risk category comprises a multiplier indicating increasedlikelihood of having the cancer correlated to a range of compositescores; and, 5) providing a risk score for the human subject, wherebythe quantified increased risk for the presence of a cancer in anasymptomatic human subject has been determined.

One or more steps of the method described herein can be performedmanually or can be completely or partially automated (for example, oneor more steps of the method can be performed by a computer program oralgorithm. If the method were to be performed via computer program oralgorithm, then the performance of the method would further necessitatethe use of the appropriate hardware, such as input, memory, processing,display and output devices, etc). Methods for automating one or moresteps of the method would be well within the skill of those in the art.

In yet further embodiments, the present invention contemplates specificuse computer, which may be a general purpose computer, configured toperform the steps of the method described herein. The method, orportions of the method, may be further embodied in a computer readablemedium capable of being executed in a computer environment. Suchcomputer readable medium may be a specific storage device, such as adisk, or a location on a server, physical or virtual, the may beaccessed by a computer for performing the required steps of the method.

i) Measuring Markers in a Sample

The first step in the present method is measuring a panel of markersfrom an asymptomatic human subject. There are many methods known in theart for measuring either gene expression (e.g. mRNA) or the resultinggene products (e.g. polypeptides or proteins) that can be used in thepresent methods.

The method of interest is not limited to any one assay format or to anyparticular set of markers that comprise a panel. For example, PCTInternational Pat. Pub. No. WO 2009/006323; US Pub. No. 2012/0071334; USPat. Pub. No. 2008/0160546; US Pat. Pub. No. 2008/0133141; US Pat. Pub.No. 2007/0178504 (each herein incorporated by reference) teaches amultiplex lung cancer assay using beads as the solid phase andfluorescence or color as the reporter in an immunoassay format. Hence,the degree of fluorescence or color can be provided in the form of aqualitative score as compared to an actual quantitative value ofreporter presence and amount.

For example, the presence and quantification of one or more antigens orantibodies in a test sample can be determined using one or moreimmunoassays that are known in the art. Immunoassays typically comprise:(a) providing an antibody (or antigen) that specifically binds to thebiomarker (namely, an antigen or an antibody); (b) contacting a testsample with the antibody or antigen; and (c) detecting the presence of acomplex of the antibody bound to the antigen in the test sample or acomplex of the antigen bound to the antibody in the test sample.

Well known immunological binding assays include, for example, an enzymelinked immunosorbent assay (ELISA), which is also known as a “sandwichassay”, an enzyme immunoassay (EIA), a radioimmunoassay (RIA), afluoroimmunoassay (FIA), a chemiluminescent immunoassay (CLIA) acounting immunoassay (CIA), a filter media enzyme immunoassay (MEIA), afluorescence-linked immunosorbent assay (FLISA), agglutinationimmunoassays and multiplex fluorescent immunoassays (such as the LuminexLab MAP), immunohistochemistry, etc. For a review of the generalimmunoassays, see also, Methods in Cell Biology: Antibodies in CellBiology, volume 37 (Asai, ed. 1993); Basic and Clinical Immunology(Daniel P. Stites; 1991).

The immunoassay can be used to determine a test amount of an antigen ina sample from a subject. First, a test amount of an antigen in a samplecan be detected using the immunoassay methods described above. If anantigen is present in the sample, it will form an antibody-antigencomplex with an antibody that specifically binds the antigen undersuitable incubation conditions described above. The amount of anantibody-antigen complex can be determined by comparing the measuredvalue to a standard or control. The AUC for the antigen can then becalculated using techniques known, such as, but not limited to, a ROCanalysis.

In another embodiment, gene expression of markers (e.g. mRNA) ismeasured in a sample from a human subject. For example, gene expressionprofiling methods for use with paraffin-embedded tissue includequantitative reverse transcriptase polymerase chain reaction (qRT-PCR),however, other technology platforms, including mass spectroscopy and DNAmicroarrays can also be used. These methods include, but are not limitedto, PCR, Microarrays, Serial Analysis of Gene Expression (SAGE), andGene Expression Analysis by Massively Parallel Signature Sequencing(MPSS).

Any methodology that provides for the measurement of a marker or panelof markers from a human subject is contemplated for use with the presentmethods. In certain embodiments, the sample from human subject is atissue section such as from a biopsy. In another embodiment, the samplefrom the human subject is a bodily fluid such as blood, serum, plasma ora part or fraction thereof. In other embodiments, the sample is a bloodor serum and the markers are proteins measured there from. In yetanother embodiment, the sample is a tissue section and the markers aremRNA expressed therein. Many other combinations of sample forms from thehuman subjects and the form of the markers are contemplated.

ii) Biomarkers

However, before measurement can be performed a panel of markers needs tobe selected for a particular cancer being screened. Many markers areknown for cancers and a known panel can be selected, or as was done bythe present Applicants, a panel can be selected based on measurement ofindividual markers in retrospective clinical samples wherein a panel isgenerated based on empirical data for a desired cancer.

Examples of biomarkers that can be employed include moleculesdetectable, for example, in a body fluid sample, such as, antibodies,antigens, small molecules, proteins, hormones, genes and so on.

In a particular embodiment, a panel of markers is selected based ontheir association with lung cancer. In one embodiment, the panel ofmarkers is selected from anti-p53, anti-NY-ESO-1, anti-ras, anti-Neu,anti-MAPKAPK3, cytokeratin 8, cytokeratin 19, cytokeratin 18, CEA,CA125, CA15-3, CA19-9, Cyfra 21-1, serum amyloid A, proGRP andai-anti-trypsin (US 20120071334; US 20080160546; US 20080133141; US20070178504 (each herein incorporated by reference)). Many circulatingproteins have more recently been identified as possible biomarkers forthe occurrence of lung cancer, for example the proteins CEA, RBP4, hAAT,SCCA [Patz, E. F., et al., Panel of Serum Biomarkers for the Diagnosisof Lung Cancer. Journal of Clinical Oncology, 2007. 25(35): p.5578-5583]; the proteins IL6, IL-8 and CRP [Pine, S. R., et al.,Increased Levels of Circulating Interleukin 6, Interleukin 8, C-ReactiveProtein, and Risk of Lung Cancer. Journal of the National CancerInstitute, 2011. 103(14): p. 1112-1122.]; the proteins TNF-α, CYFRA21-1, IL-1ra, MMP-2, monocyte chemotactic protein-1 & sE-selectin[Farlow, E. C., et al., Development of a Multiplexed Tumor-AssociatedAutoantibody-Based Blood Test for the Detection of Non-Small Cell LungCancer. Clinical Cancer Research, 2010. 16(13): p. 3452-3462.]; theproteins prolactin, transthyretin, thrombospondin-1, E-selectin, C-Cmotif chemokine 5, macrophage migration inhibitory factor, plasminogenactivator inhibitor, receptor tyrosine-protein kinase, erbb-2,cytokeratin fragment 21.1, and serum amyloid A [Bigbee, W. L. P., etal.,-A Multiplexed Serum Biomarker Immunoassay Panel DiscriminatesClinical Lung Cancer Patients from High-Risk Individuals Found to beCancer-Free by CT Screening [Journal of Thoracic Oncology April, 2012.7(4): p. 698-708.]; the proteins EGF, sCD40 ligand, IL-8, MMP-8[Izbicka, E., et al., Plasma Biomarkers Distinguish Non-small Cell LungCancer from Asthma and Differ in Men and Women. CancerGenomics-Proteomics, 2012. 9(1): p. 27-35.]

Novel ligands that bind to circulating, lung-cancer associated proteinswhich are possible biomarkers include nucleic acid aptamers to bindcadherin-1, CD30 ligand, endostatin, HSP90α, LRIG3, MIP-4, pleiotrophin,PRKCI, RGM-C, SCF-sR, sL-selectin, and YES [Ostroff, R. M., et al.,Unlocking Biomarker Discovery: Large Scale Application of AptamerProteomic Technology for Early Detection of Lung Cancer. PLoS ONE, 2010.5(12):p. e15003.] and monoclonal antibodies that bind leucine-richalpho-2 glycoprotein 1 (LRG1), alpha-1 antichymotrypsin (ACT),complement C9, haptoglobin beta chain [Guergova-Kuras, M., et al.,Discovery of Lung Cancer Biomarkers by Profiling the Plasma Proteomewith Monoclonal Antibody Libraries. Molecular & Cellular Proteomics,2011. 10(12).]; and the protein [Higgins, G., et al., Variant Ciz1 is acirculating biomarker for early-stage lung cancer. Proceedings of theNational Academy of Sciences, 2012].

Autoantibodies that are proposed to be circulating markers for lungcancer include p53, NY-ESO-1, CAGE, GBU4-5, Annexin 1, and SOX2 [Lam,S., et al., EarlyCDT-Lung: An Immunobiomarker Test as an Aid to EarlyDetection of Lung Cancer. Cancer Prevention Research, 2011. 4(7): p.1126-1134.] and IMPDH, phosphoglycerate mutase, ubiquillin, Annexin I,Annexin II, and heat shock protein 70-9B (HSP70-9B) [Farlow, E. C., etal., Development of a Multiplexed Tumor-Associated Autoantibody-BasedBlood Test for the Detection of Non-Small Cell Lung Cancer. ClinicalCancer Research, 2010. 16(13): p. 3452-3462].

Micro-RNAs that are proposed to be circulating markers for lung cancerinclude miR-21, miR-126, miR-210, miR-486-5p [Shen, J., et al., PlasmamicroRNAs as potential biomarkers for non-small-cell lung cancer. LabInvest, 2011. 91(4): p. 579-587.]; miR-15a, miR-15b, miR-27b,miR-142-3p, miR-301 [Hennessey, P. T., et al., Serum microRNA Biomarkersfor Detection of Non-Small Cell Lung Cancer. PLoS ONE, 2012. 7(2): p.e32307.]; let-7b, let-7c, let-7d, let-7e, miR-10a, miR-10b, miR-130b,miR-132, miR-133b, miR-139, miR-143, miR-152, miR-155, miR-15b,miR-17-5p, miR-193, miR-194, miR-195, miR-196b, miR-199a*, miR-19b,miR-202, miR-204, miR-205, miR-206, miR-20b, miR-21, miR-210, miR-214,miR-221, miR-27a, miR-27b, miR-296, miR-29a, miR-301, miR-324-3p,miR-324-5p, miR-339, miR-346, miR-365, miR-378, miR-422a, miR-432,miR-485-3p, miR-496, miR-497, miR-505, miR-518b, miR-525, miR-566,miR-605, miR-638, miR-660, and miR-93 [United States Patent Application20110053158]; hsa-miR-361-5p, hsa-miR-23b, hsa-miR-126, hsa-miR-527,hsa-miR-29a, hsa-let-7i, hsa-miR-19a, hsa-miR-28-5p, hsa-miR-185*,hsa-miR-23a, hsa-miR-1914*, hsa-miR-29c, hsa-miR-505*, hsa-let-7d,hsa-miR-378, hsa-miR-29b, hsa-miR-604, hsa-miR-29b, hsa-let-7b,hsa-miR-299-3p, hsa-miR-423-3p, hsa-miR-18a*, hsa-miR-1909, hsa-let-7c,hsa-miR-15a, hsa-miR-425, hsa-miR-93*, hsa-miR-665, hsa-miR-30e,hsa-miR-339-3p, hsa-miR-1307, hsa-miR-625*, hsa-miR-193a-5p,hsa-miR-130b, hsa-miR-17*, hsa-miR-574-5p and hsa-miR-324-3p. [UnitedStates Patent Application 20120108462]; miR-20a, miR-24, miR-25,miR-145, miR-152, miR-199a-5p, miR-221, miR-222, miR-223, miR-320 [Chen,X., et al., Identification of ten serum microRNAs from a genome-wideserum microRNA expression profile as novel noninvasive biomarkers fornonsmall cell lung cancer diagnosis. International Journal of Cancer,2012. 130(7): p. 1620-1628]; hsa-let-7a, hsa-let-7b, hsa-let-7d,hsa-miR-103, hsa-miR-126, hsa-miR-133b, hsa-miR-139-5p, hsa-miR-140-5p,hsa-miR-142-3p, hsa-miR-142-5p, hsa-miR-148a, hsa-miR-148b, hsa-miR-17,hsa-miR-191, hsa-miR-22, hsa-miR-223, hsa-miR-26a, hsa-miR-26b,hsa-miR-28-5p, hsa-miR-29a, hsa-miR-30b, hsa-miR-30c, hsa-miR-32,hsa-miR-328, hsa-miR-331-3p, hsa-miR-342-3p, hsa-miR-374a, hsa-miR-376a,hsa-miR-432-staR, hsa-miR-484, hsa-miR-486-5p, hsa-miR-566, hsa-miR-92a,hsa-miR-98 [Bianchi, F., et al., A serum circulating miRNA diagnostictest to identify asymptomatic high-risk individuals with early stagelung cancer. EMBO Molecular Medicine, 2011. 3(8): p. 495-503.] miR-190b,miR-630, miR-942, and miR-1284 [Patnaik, S. K., et al., MicroRNAExpression Profiles of Whole Blood in Lung Adenocarcinoma. PLoS ONE,2012. 7(9): p. e46045.].

In one embodiment, a panel of markers for lung cancer is selected fromCEA (GenBank Accession CAE75559), CA125 (UniProtKB/Swiss-Prot:Q8WXI7.2), Cyfra 21-1 (NCBI Reference Sequence: NP_008850.1),anti-NY-ESO-1 (antigen NCBI Reference Sequence: NP_001318.1), anti-p53(antigen GenBank: BAC16799.1) and anti-MAPKAPK3 (antigen NCBI ReferenceSequence: NP_001230855.1), the first three are tumor marker proteins andthe last three are autoantibodies.

In a certain embodiments, a panel of markers comprises circulatingmarkers associated with colorectal cancer (CRC); those include themicroRNA miR-92 [Ng, E.K.O., et al., Differential expression ofmicroRNAs in plasma of patients with colorectal cancer: a potentialmarker for colorectal cancer screening. Gut, 2009. 58(10): p.1375-1381.]; aberrantly methylated SEPT9 DNA [deVos, T., et al.,Circulating Methylated SEPT9 DNA in Plasma Is a Biomarker for ColorectalCancer. Clinical Chemistry, 2009. 55(7): p. 1337-1346.]

In certain embodiments, a panel of markers comprises markers associatedwith a cancer selected from bile duct cancer, bone cancer, pancreaticcancer, cervical cancer, colon cancer, colorectal cancer, gallbladdercancer, liver or hepatocellular cancer, ovarian cancer, testicularcancer, lobular carcinoma, prostate cancer, and skin cancer or melanoma.In other embodiments, a panel of markers comprises markers associatedwith breast cancer.

A panel can comprise any number of markers as a design choice, seeking,for example, to maximize specificity or sensitivity of the assay. Hence,an assay of interest may ask for presence of at least one of two or morebiomarkers, three or more biomarkers, four or more biomarkers, five ormore biomarkers, six or more biomarkers, seven or more biomarkers, eightbiomarkers or more as a design choice.

Thus, in one embodiment, the panel of biomarkers may comprise at leasttwo, at least three, at least four, at least five, at least six, atleast seven, at least eight, at least nine or at least ten or moredifferent markers. In one embodiment, the panel of biomarkers comprisesabout two to ten different markers. In another embodiment, the panel ofbiomarkers comprises about four to eight different markers. In yetanother embodiment, the panel of markers comprises about six differentmarkers.

Generally, a sample is committed to the assay and the results can be arange of numbers reflecting the presence and level of presence of eachof the biomarkers of the panel in the sample.

The choice of the markers may be based on the understanding that eachmarker, when measured and normalized, contributed equally to determinethe likelihood of the presence of the cancer. Thus in certainembodiments, the each marker in the panel is measured and normalizedwherein none of the markers are given any specific weight. In thisinstance each marker has a weight of 1.

In other embodiments, the choice of the markers may be based on theunderstanding that each marker, when measured and normalized,contributed unequally to determine the likelihood of the presence of thecancer. In this instance, a particular marker in the panel can either beweighted as a fraction of 1 (for example if the relative contribution islow), a multiple of 1 (for example if the relative contribution is high)or as 1 (for example when the relative contribution is neutral comparedto the other markers in the panel). Thus, in certain embodiments, thepresent methods further comprising weighting the normalized values priorto summation of the normalized values to obtain a composite score.

Decision tree is a data handling approach where a series of simpledichotomous decisions guide through a classification to yield such adesired binary outcome. Hence, samples are partitioned based on whethervalues thereof are above or below calculated thresholds.

A model for scoring multiple biomarkers which attempts to employ adecision tree logic was developed by Mor et al., PNAS, 102(21):7677-7682(2005), wherein an optimal cutoff value is obtained and assigns a valueof 0 (not likely to have cancer) or 1 (likely to have cancer) for amarker. Then, scores of individual biomarkers are combined for a finalscore of each sample and the higher the final score, the higher theprobability of disease.

That technique provides a binary result favored by physicians andpatients. While distribution of data is not an assumption whichcontributes to simplicity of the model, that the model reducesinformation to a 1 or 0 score results in a loss of quantitativeinformation, for example, diminishes the role of a more predictivemarker and elevates the role of a less predictive marker.

Moreover, the collection of markers in a multiplex assay may comprisevarying levels of value or predictability in diagnosing disease. Hence,the impact of any one marker on the ultimate determination may beweighted based on the aggregated data obtained in screening populationsand correlating with actual pathology to provide a more discriminatingor effective diagnostic assay.

An alternative approach is to find an intermediate ground by expandingthe qualitative transformation of quantitative data into multiplecategories as compared to only a binary classification scheme.

a) Lung Cancer Biomarkers

One embodiment is directed to a method for assessing risk of lungcancer. A research effort to identify panels of biomarkers that includeda survey of known tumor protein biomarkers coupled with a discoveryproject for novel lung cancer specific biomarkers was previouslyconducted (PCT Publ. No. 2009/006323, incorporated herein by reference).This work indicates although a combination of markers can be used toincrease sensitivity of testing for cancer without greatly affecting thespecificity of the test. To accomplish this, markers were tested andanalyzed in a way that is often very different from the standardmethods. This effort culminated in the establishment of a panel of sixbiomarkers that in the aggregate yield significant sensitivity andspecificity for the early detection of lung cancer using the presentmethods. As disclosed herein, Applicants provide a new method andalgorithm that can be utilized to identify smokers at the highest levelsof risk for follow-up testing by CT scanning

In certain embodiments, the lung cancer biomarker panel comprises aseries of three tumor marker proteins and three autoantibodies. Tumormarkers, in such embodiments, are proteins released by the cancer itselfinto the patient's serum. Since the presence of these proteins or theirincreased expression is directly related to the cancer cells they tendto be specific to cancer, however they may often be found in more thanone type of cancer. Furthermore, because they are derived directly fromthe tumor, their levels will depend on the size of the tumor. This canmake them less sensitive for the detection of early stage cancers.Autoantibodies are a function of the patient's immune response to theabnormal cancerous cells. Because the immune system amplifies itsresponse even to a small amount of antigen, autoantibodies may bedetected more easily in the early stage patient than proteins releasedby the cancer itself. Unfortunately due to the heterogeneity of thecancers we classify as lung cancer and the individual differences inpatient immune responses, a large panel of autoantibodies is required tosensitively detect all lung cancers. Our panel combines both tumormarkers and autoantibodies to achieve the greatest sensitivity for earlystage lung cancer.

In certain embodiments, the tumor markers incorporated into the presentmethods for lung cancer comprise CEA, CA-125 and Cyfra 21-1. All threeof these markers have been extensively studied by others and arecurrently in clinical use for monitoring of other cancers. While none ofthese markers have fared well as a stand-alone marker for the earlydetection of lung cancer, two important points must be iterated; 1)these markers are not measured by the present method in the same waythey have been measured in the past for other indications, and 2) thesemarkers are not deployed as stand-alone markers but rather areincorporated as a part of an integrated panel of markers forre-stratification of patient risk. Specifically, results in the presentmethods for lung cancer are not based on an absolute serum level, but onan increase in level as compared to the median levels in matched controlpatients. As such, individual marker values as a total serumconcentration are not measured; instead these three markers areincorporated in a composite score that has value only in re-categorizingpatient risk for the presence of lung cancer.

In certain embodiments, three autoantibodies are utilized in the presentlung cancer test, wherein the autoantibodies comprise anti-p53,anti-NY-ESO-1 and anti-MAPKAPK3. As noted above, most autoantibodies areonly found in a limited number of patients. These three autoantibodiesare among those most commonly found in lung cancer, although each on itsown has a rather limited distribution as members of an integratedbiomarker panel because they do contribute to the overall sensitivity ofthe test. p53 is a well-known tumor suppressor protein that is oftenmutated in cancer. Such mutations may be enough to break natural immunetolerance to the protein and thus the source of anti-p53 antibodies.NY-ESO-1 has been characterized as a tumor specific marker and thusauto-antibodies against this protein may represent a way to measure thelevels of a tumor marker in early stage disease via immuneamplification. MAPKAPK3 is a kinase protein that can be activated byseveral oncogenic pathways and thus may be more commonly up-regulated inlung cancer leading to the development of autoantibodies targetedagainst it.

In certain embodiments, the method for determining a quantifiedincreased risk for the presence of a lung cancer in an asymptomatichuman subject, comprises: 1) measuring a panel of markers in sample froma human subject that is at least 50 years of age or older and has ahistory of smoking tobacco; 2) determining a normalized score for eachmarker; 3) summing the normalized score to obtain a composite score forthe human subject, 4) quantifying the increased risk for the presence ofthe lung cancer for the human subject as a risk score, wherein thecomposite score is matched to a risk category of a grouping ofstratified human subject populations wherein each risk categorycomprises a multiplier indicating increased likelihood of having thelung cancer correlated to a range of composite scores; and, 5) providinga risk score for the human subject, whereby the quantified increasedrisk for the presence of the lung cancer in an asymptomatic humansubject has been determined.

In certain embodiments, the step of normalizing comprises determiningthe multiple of median (MoM) score for each marker. In this instance,the MoM score is the subsequently summed to obtained a composite score.

It is understood that the disease cohort (e.g. a human subject that isat least 50 years of age or older and has a history of smoking tobacco)is independently determined and in this instance is well understood tobe the “at risk” group for developing lung cancer. This present methodand algorithm re-categorizes those at-risk patients into risk categoriesquantifying their true increased risk for the presence of lung cancerover the disease cohort.

In other embodiments, provided herein are methods of assessing thelikelihood that a patient has lung cancer relative to a populationcomprising the steps of: obtaining a sample from the patient; measuringthe levels of multiple biomarkers in the sample; calculating a compositescore from the biomarker measurements; comparing the patient compositescore to the composite scores of persons known to be at a high and a lowrisk for lung cancer; and determining the level of risk of the patientfor having lung cancer relative to the population.

In this instance, an asymptomatic patient's cancer risk level, relativeto a population, is determined. In certain embodiments, thedetermination may comprise quantifying the risk level relative to apopulation. In other aspects, the multiple biomarkers comprise two ormore, three or more, four or more, five or more or six or morebiomarkers. In one embodiment, the multiple biomarkers comprise sixmarkers selected from CEA, CA125, Cyfra 21-1, Pro-GRP, anti-NY-ESO-1,anti-p53, anti-Cyclin E2 and anti-MAPKAPK3.

In other embodiments, obtaining a composite score may further comprisenormalizing the measured biomarker values and summing the normalizedvalues to generate a composite score.

iii) Normalization of Data

In certain embodiments, the value obtained from measuring the marker inthe sample is normalized. There is no intended limitation on themethodology used to normalize the values of the measured biomarkersprovided that the same methodology is used for testing a human subjectsample as was used to generate the Risk Categorization Table.

Many methods for data normalization exist as are familiar to thoseskilled in the art. These include methods as simple as backgroundsubtraction, scaling, multiple of the median (MoM) analysis, lineartransformation, least squares fitting, etc. The goal of normalization isto equate the varying measurement scales for the separate markers suchthat the resulting values may be combined according to a separate aweighting scale as determined and designed by the user and are notinfluenced by the absolute or relative values of the marker found withinnature.

US Publ. No. 2008/0133141 (herein incorporated by reference) teachesstatistical methodology for handling and interpreting data from amultiplex assay. The amount of any one marker thus can be compared to apredetermined cutoff distinguishing positive from negative for thatmarker as determined from a control population study of patients withcancer and suitably matched normal controls to yield a score for eachmarker based on said comparison; and then combining the scores for eachmarker to obtain a composite score for the marker(s) in the sample.

The predetermined cutoffs can be based on ROC curves and the score foreach marker can be calculated based on the specificity of the marker.Then, the total score can be compared to a predetermined total score totransform that total score to a qualitative determination of thelikelihood or risk of having lung cancer.

Another method for score transformation or normalization is, forexample, applying the multiple of median (MoM) method of dataintegration. In the MoM method, the median value of each biomarker isused to normalize all measurements of that specific biomarker, forexample, as provided in Kutteh et al. (Obstet. Gynecol. 84:811-815,1994) and Palomaki et al. (Clin. Chem. Lab. Med.) 39:1137-1145, 2001).Thus, any measured biomarker level is divided by the median value of thecancer group, resulting in a MoM value. The MoM values can be combined(namely, summed or added) for each biomarker in the panel resulting in apanel MoM value or aggregate MoM score for each sample.

In embodiments, as additional samples are tested and presence of cancervalidated, the sample size of the cancer population and the normals fordetermining the median can be increased to yield more accuratepopulation data.

In certain embodiments, normalization comprises determining a multipleof median (MoM) score for each biomarker measured.

In the next step of the present methods, the normalized value for eachbiomarker is summed to provide a composite score for each subject. Incertain embodiments, this method comprises summing the MoM score toobtain a composite score.

In other words, the composite score is derived by measuring the levelsof each of all markers used in a panel for a particular cancer inarbitrary units and comparing these levels to the median levels found inprevious validation studies. In one embodiment, the cancer is lungcancer and the panel comprises the six markers disclosed above whereinthis method generates six initial scores representing the multiple ofthe median (MoM) for each marker for a given patient. These initialscores are summed to yield the final composite score.

In certain embodiments, the markers are measured and those resultingvalues normalized and then summed to obtain a composite score. Incertain aspects, normalizing the measured biomarker values comprisesdetermining the multiple of median (MoM) score. In other aspects, thepresent method further comprises weighting the normalized values beforesumming to obtain a composite score.

D) Risk Categorization Table

The next step of the present method comprises quantifying the increasedrisk for the presence of the cancer for the human subject as a riskscore, wherein the composite score is matched to a risk category of agrouping of stratified human subject populations wherein each riskcategory comprises a multiplier indicating increased likelihood ofhaving the cancer correlated to a range of composite scores. Thisquantification step is based on the pre-determined grouping of astratified cohort of human subjects. In one embodiment, the grouping ofa stratified population of human subjects, or stratification of adisease cohort, is in the form of a risk categorization table. Theselection of the disease cohort, the cohort of human subjects that sharecancer risk factors, are well understood by those skilled in the art ofcancer research. In certain embodiments, the cohort may share an agecategory and smoking history. However, it is understood that the cohort,and the resulting stratification, may be more multidimensional and takeinto account further environmental or biological factors (e.g.epidemiological factors).

In certain embodiments, the grouping of a stratified human subjectpopulation used to determine a quantified increased risk for thepresence of a cancer in an asymptomatic human subject, comprises: atleast three risk categories, wherein each risk category comprises 1) amultiplier indicating increased likelihood of having the cancer, 2) arisk identifier and 3) a range of composite scores. In certain aspects,wherein an individual risk score is generated by summing the normalizedvalues determined from a panel of markers for the cancer to obtain acomposite that is correlated to a risk category of the riskcategorization table. In a further aspect, the normalized values aredetermined as multiple of median (MoM) scores.

The risk identifier is a label given to a specific group to providecontext for the range of risk scores and the multiplier indicatingincreased likelihood of having the cancer in each grouping. In certainembodiments, the risk identifier is selected from low risk,intermediate-low risk, intermediate risk, intermediate-high risk andhighest risk. These risk identifiers are not intended to be limiting,but may include other labels are dictated by the data used to generatethe table and/or further refine the context of the data.

The multiplier indicating increased likelihood of having the cancer is anumerical value, such as 13.4; 5.0; 2.1; 0.7; and 0.4. This value isempirically derived and will change depending on the data, cohort of thesubject population, type of cancer, biomarkers, etc. and so on. Thus,the multiplier indicating increased likelihood of having the cancer is anumerical value selected from 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 181, 19 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, and 30,and so on, or some fraction thereof. The value indicates the increasedrisk, over the normal prevalence of cancer in the cohort population thatformed the basis for the stratification, for the human subject at thetime of testing. In other words, the human subject is from the samedisease cohort as the one used to generate the risk categorizationtable. In the example of lung cancer, a disease cohort may be a humansubject aged 50 years or older with a history of smoking tobacco. Thus,for example, if a patient receives a risk score of 13.4, then that humansubject has a 13.4 times increased risk for the presence of the cancerrelative to the population.

As disclosed above, this multiplier value is empirically determined andin the present instance is done using retrospective clinical samples. Assuch the stratification of human subjects is based on analysis ofretrospective clinical samples from subjects having a cancer wherein theactual incidence of cancer, or the positive predictive score, isdetermined for each stratified grouping. The specifics of this aredetailed below and in the example section.

In general, once a population of human subjects has been stratified apositive predictive score can be determined, when retrospective sampleswith a known medical history are used, for each stratified grouping.This actual incidence of cancer in each of these groups is then dividedby the reported incidence of cancer across the population of humansubjects. For example, if the positive predictive score for one of thegroupings from the stratified population of human subjects was 27%, thisvalue would then be divided by the actual incidence of cancer across thecohort of the population that was stratified (e.g. 2%) to yield amultiplier of 13.5. In this scenario, the multiplier indicatingincreased likelihood of having the cancer is 13.5 and a subject testedthat had a composite score matched to this category would have a riskfactor of 13.5. In other words, at the time of testing, that humansubject would be 13.5 times more likely to have the presence of cancerthan the general population in that particular cohort.

Thus, for example, a bead immunoassay was used to screen selectedpatients with lung cancer and normal individuals for the presence of apanel of three autoantibody and three antigen markers associated withlung cancer in a blinded study. One hundred thirty-four lung cancerpatients and 121 age-matched smokers without lung cancer as controlscontributed blood samples for testing. The assay employed fluorescencereporters and the degree of fluorescence was machine reported as a meanfluorescence intensity with a value ranging from 0 (lowest) to 5(highest). The values obtained for each marker in the lung cancerpopulation provided a median value, which then was used to determine theMoM value for experimental samples.

The plotted ROC curve has an AUC of 0.738. The specificity was 80% andthe sensitivity was 59%.

The aggregate scores from the lung cancer patients and the normal cohortstratified into five ranges. The specificity and sensitivity of eachrange was determined, where the sensitivity represented the number ofcancer patients with scores in any one range divided by the total numberof cancer patients, 134. The specificity was the number of the normalcohort with a score in that range subtracted from 121, divided by 121.

That stratification enabled a data transformation into a morequalitative risk categorization providing a greater degree ofinformation for subsequent choices in light of the costs of lung cancerconfirmation, for example a CAT scan or a PET scan, as well as patientcompliance. Hence, because lung cancer incidence in the at riskpopulation of heavy smokers is about 2%, that percentage was used as thecutoff point between likelihood of cancer and not, meaning, at thatlevel the individual had an even chance of having cancer, that is, 1.Positive predictive values were determined using the disease prevalenceof 2% and then that positive predictive value was divided by two toyield another risk value interpreted as the likelihood of having lungcancer as a multiple of that of the normal population risk, which can beconsidered as 1 or even chances, or as a 2% risk based on populationstudies.

The resulting risk categorization table is provided in FIG. 1 . Thethird component of each risk category of the Risk Categorization Tableis a range of composite scores. In the example provided above thesecomposite scores were generated from normalizing the data from the panelof measured biomarkers and then summing the individual values from eachmarker per sample. These composite scores were then grouped to provide arange and drove the stratification of the population. The specifics ofthis methodology are detailed below in the Example section.

Transforming the composite score to a risk category that is based onpopulation data, the physician and patient then can assess whetherfollow-up is required, necessary or recommended based on whether thereis a greater risk that is just slightly above that of any smoker, i.e.,2%, or is higher because of a greater composite score, which may bedeserving of greater consideration by the patient.

By further data transformation of the positive predictive value, thephysician and patient will be the beneficiary of a quantitative valuewith foundation in the prevalence of cancer amongst smokers whichprovides improved resolution on the risk of cancer in light of thebiomarker assay. Hence, a patient with a composite biomarker score of 20or greater has a 13-fold greater likelihood of having lung cancer thanany other heavy smoker, See FIG. 1 . That 13.4×multiplier translates toan overall risk of about 27% of having lung cancer. That is, while allheavy smokers have a 1 in 50 chance of having lung cancer prior totesting, with a composite score or 20 or more after testing, thatindividual has a 1 in 4 chance of having lung cancer. Therefore, thatperson should consider a follow-up to visualize whether any cancer ispresent.

Thus, in certain embodiments, the method for determining a quantifiedincreased risk for the presence of lung cancer in an asymptomatic humansubject, comprises: 1) measuring a level of CEA, CA125, Cyfra 21-1,anti-NY-ESO-1, anti-p53 and anti-MAPKAPK3 in a serum sample from thehuman subject, wherein the human subject is at least 50 years of age orolder and has a history of smoking tobacco; 2) determining a normalizedscore for each marker; 3) summing the normlaized score score to obtain acomposite score for the human subject, 4) quantifying the increased riskfor the presence of the lung cancer for the human subject as a riskscore, wherein the composite score is matched to one of at least threerisk categories of a grouping of a stratified human subject populationwherein each risk category comprises a multiplier indicating increasedlikelihood of having the lung cancer correlated to a range of compositescores; and, 5) providing a risk score for the human subject, wherebythe quantified increased risk for the presence of the lung cancer in anasymptomatic human subject has been determined.

In certain embodiments, the step of normalizing comprises determiningthe multiple of median (MoM) score for each marker. In this instance,the MoM score is then subsequently summed to obtain a composite score.

After quantifying the increased risk for presence of the cancer in theform of a risk score, this score may be provided in a form amendable tounderstanding by a physician. In certain embodiments the risk score isprovided in a report. In certain aspects, the report may comprise one ormore of the following: patient information, a Risk Categorization Table,a risk score, a test score, a composite score, identification of therisk category for the patient, an explanation of the Risk CategorizationTable and the resulting test score, list of biomarkers tested,description of the disease cohort, and so on.

E) Use of Methods to Aid in the Early Detection of Lung Cancer

The use in a clinical setting of the methods and algorithms according tothe present invention are now described in the context of lung cancerscreening. It should be appreciated, however, that lung cancer is onlyone of many cancer types that can benefit from the present invention.

Primary care healthcare practitioners, who may include physiciansspecializing in internal medicine or family practice as well asphysician assistants and nurse practitioners, are among the users of themethodology disclosed herein. These primary care providers typically seea large volume of patients each day and many of these patients are atrisk for lung cancer due to smoking history, age, and other lifestylefactors. In 2012 about 18% of the U.S. population was current smokersand many more were former smokers with a lung cancer risk profile abovethat of never smokers.

The aforementioned NLST study (See, background section) concluded thatheavy smokers over a certain age who undergo yearly screening with CTscans have a substantial reduction in lung cancer mortality as comparedto those who are not similarly screened. Nevertheless, for the reasonsdiscussed above, very few at risk patients are undergoing annual CTscreening. For these patients the testing paradigm according to thepresent invention offers an alternative.

A blood sample from patients with a heavy smoking history (e.g. havingsmoked at least a pack of cigarettes per day for 20 years or more) issent to a laboratory qualified to test the sample using a panel ofbiomarkers with adequate sensitivity and specific for early stage lungcancer. Non limiting lists of such biomarkers are herein included in theabove disclosure and the following examples. In lieu of blood, othersuitable bodily fluids such a sputum or saliva might also be utilized.

A biomarker composite score for that patient is then generated using thetechnique described in the present disclosure. Using the composite scorethe patient's risk of having lung cancer, as compared to others having acomparable smoking history and age range, can then be calculated using atable such as the one show in FIG. 1 . In lieu of the tabular formatshown other means of calculation may be employed including those whichutilize a computer program. In particular, if the risk calculation is tobe made at the point of care, rather than at the laboratory, a softwareapplication compatible with mobile devices (e.g. a tablet or smartphone) may be employed in lieu of a table.

Once the physician or healthcare practitioner has a risk score for thepatient (i.e. the likelihood that that patient has lung cancer relativeto a population of others with comparable epidemiological factors) theycan recommend, in particular, that those at a higher risk be followed upwith other tests such as CT scanning It should be appreciated that theprecise numerical cut off above which further testing is recommended mayvary depending on many factors including, without limitation, (i) thedesires of the patients and their overall health and family history,(ii) practice guidelines established by medical boards or recommended byscientific organizations, (iii) the physician's own practicepreferences, and (iv) the nature of the biomarker test including itsoverall accuracy and strength of validation data.

It is believed that use of the methodology disclosed herein will havethe twin benefits of ensuring that the most at risk patients undergo CTscanning so as to detect early tumors that can be cured with surgerywhile reducing the expense and burden of false positives associated withstand-alone CT screening.

F) Kits

One or more biomarkers, one or more reagents for testing the biomarkers,cancer risk factor parameters, a Risk Categorization Table, algorithmfor calculating a risk score, and any combinations thereof are amenableto the formation of kits (such as panels) for use in performing thepresent methods.

In certain embodiments, the kit can comprise (a) reagents containing atleast one antibody for quantifying one or more antigens in a testsample, wherein said antigens comprise one or more of: cytokeratin 8,cytokeratin 19, cytokeratin 18, CEA, CA125, CA15-3, SCC, CA19-9, proGRP,Cyfra 21-1, serum amyloid A, alpha-1-anti-trypsin and apolipoproteinCIII; (b) reagents containing one or more antigens for quantifying atleast one antibody in a test sample; wherein said antibodies compriseone or more of: anti-p53, anti-TMP21, anti-NPC1L1C-domain, anti-TMOD1,anti-CAMK1, anti-RGS1, anti-PACSIN1, anti-RCV1, anti-MAPKAPK3,anti-NY-ESO-1 and anti-Cyclin E2; and (c) one or more algorithms orcomputer programs for performing the steps of normalizing the amount ofeach antigen and/or antibody measured in the test sample, summing thosenormalized values to obtain a composite score and assigning a risk scoreor test score to each patient by correlating the composite score to aRisk Categorization Table and using the quantified increased risk forthe presence of the cancer as an aid for further definitive cancerscreening.

Alternatively, in lieu of one or more algorithms or computer programs,one or more instructions for manually performing the above steps by ahuman can be provided. The reagents included in the kit for quantifyingone or more regions of interest may include an adsorbent which binds andretains at least one region of interest contained in a panel, solidsupports (such as beads) to be used in connection with said absorbents,one or more detectable labels, etc. The adsorbent can be any of manyadsorbents used in analytical chemistry and immunochemistry, includingmetal chelates, cationic groups, anionic groups, hydrophobic groups,antigens and antibodies.

In certain embodiments, the kit comprises the necessary reagents toquantify at least one of the following antigens, cytokeratin 19,cytokeratin 18, CA 19-9, CEA, CA-15-3, CA125, SCC, Cyfra 21-1, serumamyloid A, and ProGRP. In another embodiment, the kit comprises thenecessary reagents to quantify at least one of the following antibodiesanti-p53, anti-TMP21, anti-NPC1L1C-domain, anti-TMOD1, anti-CAMK1,anti-RGS1, anti-PACSIN1, anti-RCV1, anti-MAPKAPK3, anti-NY-ESO-1 andanti-Cyclin E2.

In some embodiments, the kit further comprises one or more algorithms orcomputer programs for performing some or all the steps of the methoddescribed herein. The kit may further comprise an apparatus configuredwith a computer program to receive the values from the evaluation ofmarkers in a sample and making the required calculations to determine acomposite score and compare it to a grouping of stratified populationcomprising multiple risk categories (e.g. a Risk Categorization Table)and provide a Risk Score.

G) Apparatus

The present invention further provides for an apparatus for assessing asubject's risk level for the presence of cancer and correlating with anincrease or decrease of the presence of cancer after testing relative toa population. The apparatus comprises a computer program or softwareapplication to receive the values from the evaluation of markers in asample and make the required calculations to determine a composite scoreand compare it to a grouping of stratified population comprisingmultiple risk categories (e.g. a Risk Categorization Table) and providea Risk Score. The methods for obtaining and calculating a compositescore and risk score are described above.

The apparatus can take one of a variety of forms, for example, thecorrelation and means of matching can be provided as a computer programin any format known to a person of ordinary skill in the art that allowsthe method to be implemented in a handheld device, a tablet, or anyother type of computer or electronic device, the apparatus can be acomputer software product, an application for a handheld device, ahandheld device configured to performed the method, or it can be aworld-wide-web (WWW) page or other network accessible location, or itcan be a computing device. Alternatively, the apparatus can be a simplefunctional representation of the correlation such as a nomogram providedon a card, or wheel, that is readily portable and simple to use. Forexample, the apparatus can be in the form of a laminated card or wheel.Accordingly, the correlation can be a graphic representation, which, insome embodiments, is stored in a database or memory, such as a randomaccess memory, read-only memory, disk, virtual memory or processor.Other suitable representations, pictures, depictions or exemplificationsknown in the art may also be used.

The apparatus may further comprise a storage means for storing thecorrelation or nomogram, an input means that allows the input into theapparatus of the identical set of factors determined for a subject, anda display means for displaying the status of the subject in terms of theparticular medical condition. The storage means can be, for example,random access memory, read-only memory, a disk, virtual memory, adatabase, or a processor. The input means can be, for example, a keypad,a keyboard, stored data, a touch screen, a voice-activated system, adownloadable program, downloadable data, a digital interface, ahand-held device, or an infrared signal device. The display means canbe, for example, a computer monitor, a cathode ray tube (CRT), a digitalscreen, a light-emitting diode (LED), a liquid crystal display (LCD), anX-ray, a compressed digitized image, a video image, or a hand-helddevice. The apparatus can further comprise a database, wherein thedatabase stores the correlation of factors and is accessible to theuser.

In one embodiment of the present invention, the apparatus is a computingdevice, for example, in the form of a computer or hand-held device thatincludes a processing unit, memory, and storage. The computing devicecan include, or have access to a computing environment that comprises avariety of computer-readable media, such as volatile memory andnon-volatile memory, removable storage and/or non-removable storage.Computer storage includes, for example, RAM, ROM, EPROM & EEPROM, flashmemory or other memory technologies, CD ROM, Digital Versatile Disks(DVD) or other optical disk storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices, or other mediumknown in the art to be capable of storing computer-readableinstructions. The computing device can also include or have access to acomputing environment that comprises input, output, and/or acommunication connection. The input can be one or several devices, suchas a keyboard, mouse, touch screen, or stylus. The output can also beone or several devices, such as a video display, a printer, an audiooutput device, a touch stimulation output device, or a screen readingoutput device. If desired, the computing device can be configured tooperate in a networked environment using a communication connection toconnect to one or more remote computers. The communication connectioncan be, for example, a Local Area Network (LAN), a Wide Area Network(WAN) or other networks and can operate over a wired network, wirelessradio frequency network, and/or an infrared network.

All references cited herein are herein incorporated by reference inentirety.

EXAMPLES

The Examples below are given so as to illustrate the practice of thisinvention. They are not intended to limit or define the entire scope ofthis invention.

Example 1: Study of Lung Cancer Biomarker Expression in RetrospectiveClinical Samples

Over 1000 blood samples from patients with all stages of lung cancer,the at risk population (20 pack-year smokers over 50 years of age) andvarious other control groups including those with non-cancerous lungdisorders and other cancers (prostate, breast and colorectal) areprovided herein (See, FIG. 2 ). These samples were collected frommultiple cohorts of patients over a 5 year period from several sitesboth in the United States and in Europe.

FIG. 3 shows a receiver operator characteristic (ROC) curve analysis ofall lung cancer vs. all non-cancer samples yielded an area under thecurve (AUC) of 0.76. Choosing a cutoff of 10.7 shows a specificity of80% yields a sensitivity of 64%. The data was further analyzed as afunction of tumor stage using the same cutoff and yielding 75%sensitivity in late stage disease and 59% sensitivity for early stagedisease indicating that the test has a higher sensitivity for laterstage disease. Amongst the non-cancerous patients, the specificity ofthe test was not affected by either smoking status or the presence ofnoncancerous lung disorders (including asthma, COPD, emphysema, fibrosisand pneumonia) all of which yielded scores in the same proportions asthe overall normal population. Analysis of the scores of three othercancer types (prostate, breast and colorectal) demonstrated that theseother cancers can yield a higher score more often than non-cancerousconditions, but that the range and median of scores is more similar tothat found in the normal population.

A further validation study was performed using a cohort of 322 samplesobtained with the specific intent of early detection of cancer in thehigh risk population (FIG. 4 ). All of the cancers in this study wereearly stage cancers and the control group specifically consisted ofage-matched long-term heavy smokers. An ROC curve analysis was againperformed and yielded an AUC of 0.73 (FIG. 5 ). Applying the cutoffdetermined in the development stage yielded a specificity of 80% and asensitivity of 57%.

Example 2: Validation of Biomarker Panel and Assay Redesign

Applicants redesigned the assay system to allow for multiplexing ofbiomarker detection and thus increased efficiency in the clinicaldiagnostic lab. As part of this work, Applicants performed a fullanalytical validation of the test in accordance with standard clinicallaboratory practice. Specifically, assay linearity, precision andreproducibility were assessed for each of the six biomarkers. FIG. 6presents the linearity of one of the tumor markers in a spike andrecovery assay. All 6 biomarkers, those disclosed above, are detected inthe linear range and have r² values of >0.9. Precision and repeatabilitywere determined by testing 3 samples twice a day for 5 days each for twoindependent operators. An additional 5 days of data were collected for asingle operator. All testing was performed in duplicate. FIG. 7 displaysrepresentative data for 3 of the markers.

A clinical bridging study was also performed in which 181 blood samplesthat had been previously tested by others were retested using theredesigned assay. The data indicated a slight decrease in the clinicalspecificity and sensitivity of the assay in the redesigned system. Thisloss was deemed acceptable as it may be attributable to some loss insample integrity due to the aging and handling of the tested samples inthe period of time since the original testing by others.

Example 3: Final Validation Study

The Applicants a blinded retrospective study of the present methodsusing a total of 255 patients including 134 confirmed diagnoses of lungcancer and 121 age-matched >20 pack-year smokers as controls. The studygroup included two cohorts of patients collected at two separate cancercenters; one in the Northeastern United States and one in the Southwest.Cancer patients were a mix (50:50) of early and late stage. All sixbiomarkers were tested and analyzed to yield the composite score. Thedata is presented as a box plot in FIG. 8 . A ROC curve analysis yieldedan AUC of 0.74 and applying a cutoff to hold the specificty at 80%yields a sensitivity of 59% FIGS. 4B & C). This data is in goodagreement with the previous studies (FIGS. 2-5 ).

Example 4: Clinical Utility and Risk Categorization

The above described biomarker panel and methods are intended for use asan aid to determine which high-risk patients need to be directed toappropriate non-invasive diagnostic follow-up, especially chest CT scanfor patients at high-risk for lung cancer. More specifically, it isintented for asymptomatic individuals ≥50 years of age with a history oftobacco use of ≥20 pack-years and who are either current or formersmokers. This test is not indicated for individuals who have had aprevious diagnosis of cancer, or who currently have symptoms indicativeof lung cancer, or who are already enrolled in and complying with anannual CT screening program. The test is not for use to render adiagnosis of lung cancer; a definitive diagnosis of lung cancer can onlybe rendered histologically and/or cytologically.

The test generates a risk score based on the levels of 6 biomarkers inpatient serum. This score is an indicator of the level of risk for eachpatient of currently having lung cancer relative to others with acomparable smoking history. Applicants have herein developed a riskcategorization tool based on test experience resulting fromretrospective studies performed during the development of the test (FIG.1 ). This table indicates the liklihood that a patient in a given scorerange has cancer at the time of testing. Likelihoods are based on aknown prevalence of lung cancer of 1.5-2.0% in the at-risk population(>50 years of age, >20 pack-years smoking history). The result of thetest informs the physican to determine whether the risk that a patienthas cancer warrants that he/she should be followed up with chest CTscan. The decision for follow up is also based on specific factorsassociated with the individual patient (overall health, family history,insurance, interest level in early detection, etc.)

An expanded table (FIG. 9 ) indicates the sensitivity, specificity,accuracy, positive and negative predictive values obtained from thepresent methods when all patients with a score above the given value areproscribed for further follow-up. The table further indicates the numberof cancers detected and the number of false positive results generatedout of a patient base of 1000 high-risk individuals when the givencutoff is used. Note, for example, that if all individuals with anIntermediate Risk or higher are sent to CT, 10 out of 20 cancers (50%)will be detected early while only 137 out of 1000 (14%) ofassymptomatic, heavy smokers over age 50 will need to be subjected to CTscanning Using this cutoff would substantially reduce false positivesassociated with CT screening but could miss as many as half of earlycancers if the blood test is given only once. (We anticipate fewer falsenegatives from serial testing 1-2 times per year.) It should be notedthat even with a sensitivity of 51%, a patient with a score below 9 onlyhas about a 1% chance of having lung cancer at the time of the testbased on our data. On the other hand, refering all patients with atleast an Intermediate-Low Risk (i.e. test score of greater than 6) wouldimprove sensitivity to better than 80% but would make less of animprovement in reducing the false positive rates of CT scans.

The present risk catagorization table and method for lung cancer is amultiplex immunoassay that determines the risk of the presence of lungcancer in asymptomatic individuals who are greater than 50 years of ageand current or former smokers with greater than or equal to a twentypack-year smoking history (less than or equal to 15 years since last useof tobacco for former smokers). The test analyzes six biomarkers thatgive a composite score that categorizes patients into lung cancer riskcategories based on empiric data from retrospective clinical studies. Itis intended that results of this test are to be used in conjunction withother clinical data to determine the appropriate diagnostic follow up.

Example 5: Validation of the Algorithm to Determine in AsymptomaticHuman Subjects a Quantified Increased Risk for the Presence of LungCancer

A multiplex diagnostic platform is an automated comprehensive systemcapable of isolating the target analyte (protein antigen orautoantibody), performing the test, and displaying the interpretation ofthe multiplex test result. To accomplish our multiplexed test we use aflow cytometry bead-based approach. Multiplex bead array assays providequantitative measurement of large numbers of analytes using an automated96-well plate format. The Luminex method uses microsphere sets carryingvariable quantities of two different fluorescent dyes that produce up to100 different shades of color. Each bead is coupled to a unique antibodyor protein that recognizes a specific molecule. After the beads aremixed with a serum sample and added to the instrument, the unique colorsignature on each bead reveals the identity of the bound molecules. Thelevel of fluorescence (reported as Median Flourescence Intensity or MFI)of the tagged antibody or protein indicates the level of antibody orprotein in the serum.

Our panel of biomarkers includes 3 autoantibodies (p53 (PierceRP-39232), NY-ESO-1(Pierce RP-39227), and Mapkapk3 (Genway10-782-55070)) and 3 tumor markers (CA125, CEA and CYFRA 21-1). Thesethree autoantibody markers as well as the protein CEA marker (anti-CEA,Abcam ab4451) are produced in-house using the Luminex beads/plateformtechnology. Commercially available reagents for CA125 and Cyfra 21-1(Millipore HCCBP1MAG-58K-02) are used.

Autoantibody Assay

In this assay, protein (antigen) is coupled to Luminex beads. The beads(with 3 unique color signatures each with a single biomarker protein)are then incubated with the patient serum (capture of the specificautoantibody). After incubation and washing steps the bead/antibodycomplex is exposed to the fluorescent labeled anti-human reporterantibody (Thermo, PAI-86078). The complex is then washed again and thenplaced in the Luminex instrument. The color signature distinguishes thebiomarker being measured and the median fluorescence intensity of thereporter indicates the amount of the autoantibody of interest. NY-ESO-1is coupled to Luminex bead, region 35 (Luminex, MC10035), p53 is coupledto Luminex bead, region 43 (Luminex, MC10043) and MapkapK3 is coupled toLuminex bead, region 45 (Luminex, MC10043)

Tumor Protein Assay

In this assay an antibody to the protein of interest is coupled to asurface-Luminex bead. The bead is then incubated with the patient serum.The protein of interest binds to the antibody coated bead (capture).Next, a second antibody (detection) is incubated with the captureantibody-protein complex. The detection antibody is labeled with afluorescent tag. After washing unbound material away, the complex or“sandwich” (capture antibody-protein-detection antibody) is placed inthe Luminex instrument. The color signature of the Luminex beadindicates the analyte being measured and the Median FlourescentIntensity (MFI) measures the amount of protein biomarker present in thesample.

The two assays have different incubation times etc, so for this reasontwo separate multiplex assays are performed. The data is combined andthe output placed into our data analysis sheet/calculator. The valuesfrom each of the markers is normalized by calculating the multiple ofmedian (MoM) score for each individual marker and then the sum of allsix MoM scores are correlated to a risk category of the RiskCategorization Table. This risk score is provided in a report to thephysician for their use.

For example, the mean Fluorescent Intensity (data not shown) of eachmarker tested from patient samples, as determined by the MagPixInstrument, were transferred to a Data Analysis Worksheet. The mean,standard deviation and % CV were then calculated for the triplicate MFIvalues. After background, MFI was subtracted and the MoM was calculatedfor each marker. After the individual medians of the six markers werecalculated, they are added together to provide an aggregate or compositescore. The sum of the MoM values (or composite score) was then assignedto the patient and reported as the increased risk for the presence oflung cancer. The numerical value for the risk score was obtained bycorrelating the composite score to the Risk Categorization Table.

Example 6: Generation of a Risk Categorization Table for Lung Cancer

The stepwise construction of a risk categorization table was performedas follows. See, FIG. 1 . First, a table of data was constructed byperforming the multi-analyte test on a cohort comprising 121 controlnon-cancer subjects and 134 lung cancer subjects. For each subject ofthe study, in one row of a spreadsheet program (Microsoft EXCEL) the sumof the MOMs for the six markers was aligned with the clinical condition,i.e. cancer or non-cancer, such that all subjects of the same conditionwere in contiguous rows. (To facilitate performance of the followingsteps manually, at this point the data may be sorted by scores indescending or ascending order before proceeding.) The second step was toselect a specific number of risk categories that are considered to beclinically relevant to the relative need to perform follow-upprocedures. In this example it was decided to use five risk categories.Thirdly, each of five risk categories was assigned a range of MOMscores, in which increasing MOM score ranges would be associated withhigher risk categorizations. Five ranges were defined by selecting the 5pairs of specific cutoffs, which were: “>20” (highest risk); 14<score≤20(Intermediate-High Risk); 9<score≤14 (Intermediate Risk); 6<score≤9(Intermediate-Low Risk); and score≤6 (Low Risk). In statistics anddiagnostic testing, the positive predictive value, or precision rate isthe proportion of positive test results that are true positives (such ascorrect diagnoses). For each risk category, a positive predictive valuewas calculated by using a standard formula known in the art, which isapplicable to data from cohort studies in which arbitrary numbers ofdisease and control subjects are selected by the experimenter, which is:

PPV=SE*PR/((SE*PR)+(1-SP)*(1-PR))

In which PPV is the Positive Predictive value;

-   -   SP is the specificity which is defined by the formula (negative        tests, disease absent)/[(negative tests, disease        absent)+(positive tests, disease absent)];    -   SE is the sensitivity which is defined by the formula (positive        tests, disease present)/[(positive tests, disease        present)+(negative tests, disease present)];    -   and the prevalence (PR) is an estimate of the frequency of        occurrence of the disease in the population of individuals who        are to be screened for the disease, as restricted by known risk        factors for the disease (i.e. for lung cancer, the known major        epidemiological risk factors include age, gender, smoking        intensity, and possibly time since cessation of tobacco use).        See, FIGS. 9 and 10 . [Bach, P. B., et al., Variations in Lung        Cancer Risk Among Smokers. Journal of the National Cancer        Institute, 2003. 95(6): p. 470-478.; Bach, P. B., et al.,        Screening for Lung Cancer*ACCP Evidence-Based Clinical Practice        Guidelines (2nd Edition). CHEST Journal, 2007. 132(3_suppl): p.        69S-77S.; Spitz, M. R., et al., A Risk Model for Prediction of        Lung Cancer. Journal of the National Cancer Institute, 2007.        99(9): p. 715-726.; Tammemagi, C. M., et al., Lung Cancer Risk        Prediction: Prostate, Lung, Colorectal and Ovarian Cancer        Screening Trial Models and Validation. Journal of the National        Cancer Institute, 2011. 103(13): p. 1058-1068.]        Furthermore for each risk category, to calculate the post-test        risk of having cancer given a positive blood test, the positive        predictive value was then divided by the prevalence of lung        cancer or pretest epidemiologic risk of having lung cancer as        obtained from epidemiological studies. In those calculations,        the final calculation is a dimensionless number because it is a        ratio of two decimal fractions i.e. positive predictive value        divided by the subject's lung cancer prevalence given an age        range and a range of behavioral smoking intensity such as may be        mitigated by recent smoking cessation. This number is the        subject's absolute, post-test, fold risk of having lung cancer,        as compounded of both their estimated personal risk factors and        the result of their blood test result.

It is also possible to elaborate a further series of more accurate riskstratification tables that could be devised to tailor the predictedpost-test risk to account for the dependence of the subject's pre-testrisk on various factors which include age, gender, smoking intensity andyears since smoking cessation. It is further possible to employ arecursive strategy to devise risk categories that result in apredetermined series of risk levels, such a 0.5-fold, 1.0-fold, 3-fold,5-fold and 10-fold, which could be derived by modifying the cutoffsuntil the required categories of a series are realized.

Example 7: Patient Test Results

In the latter half of 2012 a blood test for the early detection of lungcancer was offered to primary care physicians in the Washington, D.C.area. Approximately 250 blood samples were received, tested and scoredaccording to the method of the present invention. Seewww.BloodTestforLungCancer.com. The test results, including the riskscore, were reported to the treating physician. The Aggregate MoM Valuesfor these patients ranged from 0 to 248 with 5% deemed to equate to anintermediate risk or higher (see Risk Categorization Table, FIG. 1 ).

For most of these samples, approximately 2 ml of blood were drawn in aserum separator tube then spun, sent to the laboratory, and within twodays the multiplex biomarker test was performed in the manner set forthin the preceding Examples. Six proteins were analyzed in the panel onthe Luminex Magpix including 3 cancer biomarkers and 3 autoantibodies.For the cancer biomarkers, five microliters of plasma were diluted in 95microliters of buffer and for the autoantibodies, three microliters ofplasma are diluted in 57 microliters of buffer. These were run intriplicate and the negative control values were subtracted from thesevalues. Median of the mean (MOM) was calculated for each of the proteinsby dividing these average—background values for the patient by themedian value for all patients. The score was determined by the sum ofthe MOM values.

In December 2012 a blood sample from a 51 year old high-risk patient wasreceived and tested according to the methodology set forth in thepreceding Examples. The Aggregate MoM Value (i.e. Composite Score) forthis patient was determined to be score reported as 120 corresponding tothe highest risk for lung cancer (at least 13.4 times increasedlikelihood of having lung cancer in the high-risk smoking population).

TABLE A Data and result of lung cancer test for the patient withComposite Score of 120. Cancer Biomarkers Autoantibodies CEA CA125 CyfraNY-ESO-1 p53 MAPKAK3 Average Value 1927.5 106.8 73.5 14.5 16.5 14.2Average Background 17.0 17.3 34.0 20.0 7.3 12.7 Average value- 1910.589.4 39.5 −5.5 9.2 1.5 Background (Av-Bkg) Median 19 10 4 33 83 41 MOM(Av-Bkg/Median) 100.6 8.9 9.9 −0.1 0.1 0.0 SUM 119.4

A chest CT scan was performed 4 days later that showed a 5.4 cm mass atthe hilum of the right lung and a PET scan performed the same day waspositive consistent with the presence of lung cancer. Patient has beenreferred to an oncologist for further evaluation.

The results from these 250 patient samples demonstrates in a real worldclinical setting that the method and algorithm according to the presentinvention assists in categorizing some patients as lower risks andothers as higher risk.

What is claimed is:
 1. A method of determining a risk level for thepresence of a cancer in an asymptomatic human subject relative to apopulation, comprising: measuring at least one cancer marker in a samplefrom the human subject and normalizing measured values, wherein the atleast one cancer marker is from a panel of markers associated with thepresence of the cancer; summing each normalized value to obtain acomposite score; and, matching the composite score to a predeterminedrisk score, whereby the risk level for the presence of the cancer in anasymptomatic human subject has been determined.
 2. The method of claim1, wherein the cancer is selected from lung cancer, kidney cancer,breast cancer, bile duct cancer, bone cancer, pancreatic cancer,cervical cancer, colon cancer, colorectal cancer, gallbladder cancer,liver or hepatocellular cancer, ovarian cancer, testicular cancer,lobular carcinoma, prostate cancer, and skin cancer or melanoma.
 3. Themethod of claim 1, wherein the cancer is lung cancer.
 4. The method ofclaim 1, wherein the panel of markers comprise two of more markersselected from CEA, CA125, Cyfra 21-1, Pro-GRP, anti-NY-ESO-1, anti-p53,anti-Cyclin E2 and anti-MAPKAPK3.
 5. The method of claim 1, wherein thenormalizing comprises determining the multiple of median (MoM) score foreach marker.
 6. The method of claim 1, wherein the predetermined riskscore is obtained from a grouping of a stratified human subjectpopulation comprising at least three risk categories wherein each riskcategory comprises a risk score correlated to a range of compositescores.
 7. The method of claim 1, wherein the predetermined risk scoreis determined from retrospective clinical samples.
 8. The method ofclaim 6, wherein the risk category further comprises a risk identifier.9. The method of claim 8, wherein the risk identifier is selected fromlow risk, intermediate-low risk, intermediate risk, intermediate-highrisk and highest risk.
 10. The method of claim 1, wherein thepredetermined risk score is obtained from a risk categorization tablecomprising a risk value correlated to a range of composite scores. 11.The method of claim 1, further comprising weighting the measured valuesof the markers.
 12. The method of claim 1, wherein the risk score isprovided to a physician as the human subject's increased risk of havingthe presence of the cancer relative to a rate of having the cancer in asame cohort before testing.
 13. A method of assessing the likelihoodthat a patient has lung cancer relative to a population comprising thesteps of: (i) obtaining a fluid sample from said patient; (ii) measuringthe levels of multiple biomarkers in said sample; (iii) calculating acomposite score from said biomarker measurements; (iv) comparing saidpatient composite score to the composite scores of persons known to beat a high and a low risk for lung cancer; and (v) determining the levelof risk of the patient for having lung cancer relative to thepopulation.
 14. The method of claim 13, wherein the patient and thepopulation are over age 50 and have had a history of smoking cigarettes.15. The method of claim 14, wherein the smoking history comprises atleast about a 20 pack year smoking history.
 16. The method of claim 13,wherein the multiple biomarkers comprise two of more markers selectedfrom CEA, CA125, Cyfra 21-1, Pro-GRP, anti-NY-ESO-1, anti-p53,anti-Cyclin E2 and anti-MAPKAPK3.
 17. The method of claim 13, whereinthe composite score of the patient is compared to the composite score ofpatients in at least three categories comprising those at high,intermediate, and low risk.
 18. The method of claim 13, wherein thecomposite score of the patient is compared to the composite score ofpatients in at least five categories comprising those at highest,intermediate high risk, intermediate risk, intermediate low risk, andlow risk.
 19. A method of determining a risk level for the presence oflung cancer in an asymptomatic human subject relative to a population,comprising: measuring a panel of markers in a human subject; determininga multiple of median (MoM) score for each marker; summing the MoM scoreto obtain a composite score for each human subject, quantifying theincreased risk for the presence of the cancer for the human subject as arisk score, wherein the composite score is matched to a risk category ofa grouping of stratified human subject populations wherein each riskcategory comprises a multiplier indicating increased likelihood ofhaving the lung cancer correlated to a range of composite scores; and,providing a risk score for the human subject, whereby the quantifiedincreased risk for the presence of the lung cancer in an asymptomatichuman subject has been determined.
 20. The method of claim 19, whereinthe panel of markers is selected wherein the panel of markers comprisetwo of more markers selected from CEA, CA125, Cyfra 21-1, Pro-GRP,anti-NY-ESO-1, anti-p53, anti-Cyclin E2 and anti-MAPKAPK3.
 21. Themethod of claim 19, wherein the panel of markers is selected from CEA,CA125, Cyfra 21-1, anti-NY-ESO-1, anti-p53 and anti-MAPKAPK3.
 22. Themethod of claim 19, wherein the sample is blood, blood serum, bloodplasma, or some part thereof.
 23. The method of claim 19, wherein thegrouping of a stratified human subject population, the multiplierindicating increased likelihood of having the cancer and the range ofcomposite scores are determined from retrospective clinical samples of apopulation.
 24. The method of claim 19, wherein the risk categoryfurther comprises a risk identifier.
 25. The method of claim 24, whereinthe risk identifier is selected from low risk, intermediate-low risk,intermediate risk, intermediate-high risk and highest risk.
 26. Themethod of claim 19, wherein calculating the multiplier indicatingincreased likelihood of having the cancer for each risk categorycomprises stratifying the human cohort based on retrospective MoM scoresand weighting a known prevalence of the cancer in the cohort by apositive predictive score for each stratified population.
 27. The methodof claim 19, wherein the grouping of a stratified human subjectpopulation comprises at least three risk categories wherein themultiplier indicating increased likelihood of having cancer is about 2or greater.
 28. The method of claim 19, wherein the grouping of astratified human subject population comprises at least two riskcategories wherein the multiplier indicating increased likelihood ofhaving cancer is about 5 or greater.
 29. The method of claim 19, whereinthe human subject is aged 50 years or older and has a history of smokingtobacco.
 30. A method of determining a quantified increased risk for thepresence of lung cancer in an asymptomatic human subject, comprising:measuring at least one cancer marker in a sample from the human subject,wherein the human subject is at least 50 years of age or older and has ahistory of smoking tobacco and wherein the at least one cancer marker isselected from CEA, CA125, Cyfra 21-1, Pro-GRP, anti-NY-ESO-1, anti-p53,anti-Cyclin E2 and anti-MAPKAPK3; determining a multiple of median (MoM)score for each marker measured in the sample; summing the MoM score toobtain a composite score for each human subject, quantifying theincreased risk for the presence of the lung cancer for the human subjectas a risk score, wherein the composite score is matched to one of atleast three risk categories of a grouping of stratified human subjectpopulations wherein each risk category comprises a multiplier indicatingincreased likelihood of having the lung cancer correlated to a range ofcomposite scores; and, providing a risk score for the human subject,whereby the quantified increased risk for the presence of the lungcancer in an asymptomatic human subject has been determined.
 31. Themethod of claim 30, wherein the panel of markers is selected from CEA,CA125, Cyfra 21-1, anti-NY-ESO-1, anti-p53 and anti-MAPKAPK3.
 32. Themethod of claim 30, wherein the sample is blood, blood serum, bloodplasma, or a component thereof.
 33. The method of claim 30, wherein thecomposite score of the human subject is compared to the composite scoreof human subjects in at least three risk categories comprising those athigh, intermediate, and low risk.
 34. The method of claim 30, whereinthe composite score of the human subject is compared to the compositescore of human subjects in at least five categories comprising those athighest, intermediate high risk, intermediate risk, intermediate lowrisk, and low risk.
 35. The method of claim 30, wherein the risk scoreis provided to a physician as the human subject's increased risk ofhaving the presence of the cancer relative to a population beforetesting.
 36. The method of claim 30, wherein calculating the multiplierindicating increased likelihood of having the cancer for each riskcategory comprises stratifying the human cohort based on retrospectiveMoM scores and weighting a known prevalence of the cancer in the cohortby a positive predictive score for each stratified population.
 37. Themethod of claim 30, wherein the history of smoking tobacco comprisessmoking at least 20 packs a year.
 38. A grouping of a stratified humansubject cohort used to assess a risk level for the presence of a cancerin an asymptomatic human subject relative to a population, comprising:at least three risk categories, wherein each risk category comprises arisk score, a risk identifier and a range of composite scores, wherein ahuman subject's risk score is determined by comparing the humansubject's calculated composite score to the range of composite scoreswhich are correlated to a predetermined risk score.
 39. The grouping ofa stratified human subject cohort of claim 38, wherein the riskidentifier is selected from low risk, intermediate-low risk,intermediate risk, intermediate-high risk and highest risk.
 40. Thegrouping of a stratified human subject cohort of claim 38, wherein riskscore comprises a multiplier indicating increased likelihood of havingthe cancer calculated for each risk category comprising stratifying thehuman population based on retrospective MoM scores and weighting a knownprevalence of the cancer in the cohort by a positive predictive scorefor each stratified population.
 41. The grouping of stratified humansubject populations of claim 38, wherein the composite scores are a sumof multiple of median (MoM) scores determined from a panel of markersfor the cancer.
 42. The grouping of stratified human subject populationsof claim 38, wherein the range of composite scores are derived fromretrospective clinical samples.
 43. The grouping of stratified humansubject populations of claim 38, wherein the groupings are in a formselected from a table form, a software application, a computer program,and an excel spreadsheet.
 44. The grouping of stratified human subjectpopulations of claim 38, wherein the cancer is lung cancer and the rangeof composite scores were generated by measuring a panel of markers fromretrospective clinical samples, wherein the panel of markers is selectedfrom CEA, CA125, Cyfra 21-1, Pro-GRP, anti-NY-ESO-1, anti-p53,anti-Cyclin E2 and anti-MAPKAPK3.
 45. The grouping of stratified humansubject populations of claim 38, wherein the measured marker values werefurther normalized and summed to generate the range of composite scores.46. A kit for use to assess a risk level for the presence of a cancer inan asymptomatic human subject relative to a population, comprising:reagents for measuring at least one cancer marker in a sample from thehuman subject; a risk categorization table; and an algorithm fordetermining a composite score for each sample and correlating to apredetermined risk score of the Risk Categorization Table, whereby therisk level for the presence of the cancer in an asymptomatic humansubject relative to a population has been determined.
 47. The kit ofclaim 46, wherein the risk categorization table and algorithm are in aform of a software application, computer program or paper instructionsfor calculating the risk level of the presence of cancer in the humansubject relative to a population.
 48. The kit of claim 46, wherein thecancer is selected from lung cancer, kidney cancer, breast cancer, bileduct cancer, bone cancer, pancreatic cancer, cervical cancer, coloncancer, colorectal cancer, gallbladder cancer, liver or hepatocellularcancer, ovarian cancer, testicular cancer, lobular carcinoma, prostatecancer, and skin cancer or melanoma.
 49. The kit of claim 46, whereinthe cancer is lung cancer.
 50. The kit of claim 46, wherein the reagentsare selected from antigens and antibodies for measuring biomarkers.